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Abstract 


We present computationally efficient error-correcting codes and holographic proofs. Our 
error-correcting codes are asymptotically good and can be encoded and decoded in linear 
time. Our construction of holographic proofs provide, for every proof of any theorem, a 
slightly larger “holographic” proof whose accuracy can be probabilistically checked by an 
algorithm that only reads a constant number of the bits of the holographic proof and runs 
in poly-logarithmic time (such proofs have also been called “transparent proofs” and “prob- 
abilistically checkable proofs”). We explain how these constructions are related and how 
improvements of these constructions should result in a strengthening of this relationship. 

For every constant r such that 0 < r < 1, we construct an infinite family of systematic 
linear block error-correcting codes that have an encoding circuit with a linear number of 
wires. There is a constant ¢ > 0 and a linear-time decoding algorithm for these codes that 
maps every word of relative distance at most € from a codeword to that codeword. The 
encoding circuits have logarithmic depth. The decoding algorithm can be implemented as 
a circuit with O(nlogn) wires and logarithmic depth. These constructions make use of 
explicit constructions of expander graphs and superconcentrators. 

Our constructions of holographic proofs improve on the theorem PC P(logn,1) = NP, 
proved by Arora, Lund, Motwani, Sudan, and Szegedy, by providing, for every « > 0, 
constant-query checkable proofs of size O(n'**). That is, we design a probabilistic poly- 
logarithmic time proof checking algorithm that takes two inputs: a theorem candidate and a 
proof candidate. After reading a constant number of bits from each input, the proof checker 
decides whether to accept or reject its inputs. For every rigorous proof of length n of any 
theorem, there is an easily computable holographic proof of that theorem of size O(n'**) 
such that, with probability one, the proof checker will accept the holographic proof and 
an encoding of the theorem. Conversely, if the proof checker accepts a theorem candidate 
and a proof candidate with probability greater than one-half, then the theorem candidate 
is close to a unique encoding of a true theorem and the proof candidate constitutes a proof 
of that theorem. 
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CHAPTER 1 


Introduction 


Mathematical studies of proofs and error-correcting codes played an important role in the 
development of theoretical computer science. In this dissertation, we develop ideas from 
theoretical computer science and apply them to the construction of error-correcting codes 
and proofs. Our goal is to construct error-correcting codes that can be encoded and decoded 
and proofs that can be verified as efficiently as possible. The error-correcting codes that we 
build are the first known asymptotically good family of error-correcting codes that can be 
encoded and decoded in linear time. Our construction of holographic proof systems enables 
one to transform any rigorous proof system into one whose proofs, while only slightly longer, 
can be probabilistically checked by the examination of only a constant number of their bits. 
We conclude by explaining how these constructions of error-correcting codes and holographic 


proofs are related. 


1.1. Error-correcting codes 


Error-correcting codes were introduced to deal with a fundamental problem in commu- 
nication: when a message is sent from one place to another, it is often distorted along 
the way. An error-correcting code provides a systematic way of adding information to a 
message so that even if part of the message is corrupted in transmission, the receiver can 


nevertheless figure out what the sender intended to transmit. Naturally, the probability 
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that the receiver can recover the original message decreases as the amount of distortion 
increases. Similarly, the amount of distortion that the receiver can tolerate increases as 
more redundant information is added to the transmitted message. 

In his seminal 1948 paper, A Mathematical Theory of Communication, Shannon [Sha48] 
proved tight bounds on the amount of redundancy needed to tolerate a given amount of cor- 
ruption in discrete communication channels. Shannon modeled the communication problem 
as a situation in which one is trying to send information from a source to a destination over a 
channel that occasionally becomes corrupted by noise (See Figure 1-1). Shannon described 
a coding scheme in which the transmitter breaks the original message into pieces, adds some 
redundant information to each piece to form a codeword, and then sends the codewords. 
The codewords are chosen so that if the received signal does not differ too much from the 
sent signal, the receiver will be able to remove the distortion from the received signal, fig- 
ure out which codeword was sent, and pass the corrected message on to the destination. 


Shannon defined a notion called the capacity of a channel, which decreases as the amount 


INFORMATION 
SOURCE TRANSMITTER RECEIVER DESTINATION 
CHANNEL 
SIGNAL RECEIVED 
SIGNAL 
MESSAGE MESSAGE 
NOISE 
SOURCE 


Figure 1-1: Shannon’s schematic of a communication system. 


of noise on the channel increases, and demonstrated that it is impossible to reliably send 
information from the source to the destination at a rate that exceeds the capacity of the 
channel. Thus, the capacity of the channel determines how much redundant information 
the transmitter needs to add to each piece of a message. Remarkably, Shannon was able 
to demonstrate that there is a coding scheme that enables one to reliably send information 
from the source to the destination at any rate lower than the capacity. Moreover, using 
such a coding scheme, one can arbitrarily reduce the probability that a message will fail to 


reach the destination without adding any extra redundant information: one need merely 
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increase the size of the pieces into which the message is broken. 

But, there was a catch. While Shannon demonstrated that it is possible to encode longer 
and longer pieces of messages, he did not present a efficient means for doing so. Thus, the 
study of error-correcting codes was born in an attempt to find coding schemes that approx- 
imate the performance promised by Shannon and to find efficient implementations of these 
coding schemes. In this thesis, we present the first such coding scheme in which the compu- 
tational effort needed to encode and decode corrupted codewords is strictly proportional to 
the length of the codewords. This enables us to arbitrarily reduce the probability of error in 
communication without decreasing the rate of transmission or increasing the computational 


work required. 


1.2 The purpose of proofs 


When we construct error-correcting codes, we are concerned with the transmission of a 
message, but we ignore its content. In the second part of this thesis, we explore how a 
receiver can become convinced that a message has a particular content while only reading 
a small portion of the message.' We formalize this by requiring that the message be a 
demonstration of some fact, and we consider the content of the message to be the truth of 
that fact. That is, we treat the message as a proof. 

A proof is the means by which one mathematician demonstrates the truth of an assertion 
to another. Ideally, it should be much easier for the other to become convinced of the truth 
of the assertion by reading a proof of it than by attempting to derive the veracity of the 
assertion from scratch. Thus, we view a proof as a labor-saving device: by providing proofs, 
we provide others with the fruits of our labor, but spare them its pains*. By including extra 
details and background material, we make our proofs accessible to a broader audience. If 
we supply sufficient detail, then one almost completely ignorant of mathematics should be 
able to check the veracity of our claims. 

In principle, any mathematical claim and accompanying proof can be expressed using 
~ 1Some may try to do this by reading only the introduction of the message, but one cannot be sure that 


the body will fulfill the promises made in the introduction. 
?We hope that this dissertation achieves this goal. 
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a few axioms and a simple calculus. The verifier of such a proof need only be able to 
check that certain simple string operations are performed correctly and that strings have 
been faithfully copied from one part of the proof to another. By making the proofs slightly 
longer, it is possible to construct formats for proofs in which the verifier only needs to check 
that a collection of pre-specified conditions are met, each of which only involves a constant 
number of bits of the proof. At this point, it seems that we have made the task of the 
verifier as simple as is conceivably possible. But, by allowing for a little uncertainty, we can 
make it even simpler. 

The complexity class non-deterministic polynomial time, usually called NP, roughly 
captures the set of facts that have proofs whose size is polynomial in the length of their 
statement. Similarly, the class non-deterministic exponential time, NE-XP, captures the set 
of facts that have proofs whose size is exponential in the size of the statement of the fact. 
In their proof that NEXP = MIP, Babai, Fortnow, and Lund [BFL91] demonstrated that 
it is possible to probabilistically verify the truth of one of these exponentially long proofs 
while only reading a polynomial number of the bits of the proof. Without reading the whole 
proof, the verifier cannot be certain that it is true. But, by reading a small portion of the 
proof, the verifier can be very confident that it is true. That is, the verifier will always 
accept a correct proof of a true statement; but, if the verifier examines a purported proof 
of a false statement, then the verifier will reject the purported proof with high probability. 

The following year, Babai, Fortnow, Levin, and Szegedy [BFLS91] explained how sim- 
ilar techniques could be combined with some new ideas to construct transparent proofs of 
any mathematical statement. Their transparent proof of a statement was only slightly 
longer than the formal proof from which it was derived, but it could be probabilistically 
verified in time poly-logarithmic (very small) in the size of the proof and the degree of 
confidence desired. In a related series of papers by Feige, Goldwasser, Lovasz, Safra, 
and Szegedy [FGLt91], Arora and Safra [AS92b], and Arora, Lund, Motwani, Sudan, and 
Szegedy [ALM*92], proofs were created that could be probabilistically verified by exam- 
ining only a constant number of randomly chosen bits of the proof.? However, the size of 


“These types of proofs have gone by the names transparent proofs, probabilistically checkable proofs, and 
holographic proofs. We prefer the name holographic proofs, introduced by Levin, because, as in a hologram, 
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these proofs was at least quadratic in the size of the original proofs. 

We construct proofs that combine the advantages of being nearly linear in the size of 
the original proofs and having verifiers that run in poly-logarithmic time and read only a 
constant number of bits of the proof. 

Note that we do not advocate the use of such proof systems by practicing mathemati- 
cians. A mathematician is rarely satisfied with merely knowing that a fact is true. The 
excitement of mathematics lies in understanding why. While one might obtain such an 
understanding by reading a holographic proof, it would be much simpler to read a proof 
written in plain language designed for easy comprehension. We study holographic proofs 
both because we want to find out how little work one need do to verify a fact and because, 
in the process of studying this fundamental question, we obtain results that have important 
implications in other areas. 

Among the techniques used to construct and analyze holographic proofs, we would like to 
point out the importance of those derived from the study of: checkable, self-testable/self- 
correctable, and random-self-reducible functions [BK89, RS92, BLR90, Rub90, GLR*91, 
GS92, Lip91, BF90, Sud92, She91]; techniques for reducing our dependence on random- 
ness [IZ89, Zuc91]; interactive proofs [Bab85, BM88, GMR89, GMR85, BoGK W838, FRS88, 
LFKN90, Sha90, FL92, LS91, BFL91]; and error-correcting codes [GS92, BW, BFLS91]. 
Through many of these works one finds a common algebraic thread inspired by the work of 
Schwartz [Sch80]. The crux of our construction of holographic proofs is a purely algebraic 
statement—Theorem 4.2.19. 

The main application of constructions of holographic proofs has been to prove the hard- 
ness of finding approximate solutions to certain optimization problems. Notable papers 
in this direction include [PY91, FGL+91, AS92b, ALMt92, LY94, Zuc93, BGLR93, F94, 
BS94, ABSS93]. Holographic proofs have also been applied to problems in many areas of 
complexity theory [CFLS93, CFLS94, Kil92, Kil94, Mic94, KLR*94]. While not properly an 
application of holographic proof technology, our constructions of error-correcting codes were 
each bit of information in a holographic proof is reflected in the entire structure. Some authors reserve the 
term probabilistically checkable for proof systems in which the proof checker reads the statement to be proved 


and use transparent or holographic to describe systems in which the statement of the theorem to be proved 
is encoded so that the proof checker only needs to read a constant number of its bits. 
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inspired by these proofs. They were born of an attempt to find an alternative construction 


of holographic proofs, and we hope that they one day aid in this effort. 


1.3 Structure of this thesis 


We begin in Section 1.4 with an introduction to the field of error-correcting codes from the 
perspective of this complexity theorist. In Section 1.5, we describe some of what is known 
about the complexity of error-correcting codes. The goal of this introduction is to provide 
a context for the results in Chapters 2 and 3. 

Chapters 2 and 3 are devoted to our construction of linear-time encodable and decodable 
error-correcting codes. In Chapter 2, we describe a construction of codes that can be 
decoded in linear time, but for which we only know quadratic-time encoding algorithms. 
These codes can also be decoded in logarithmic time by a linear number of processors. In this 
chapter, we introduce many of the techniques that we will use in Chapter 3. In particular, 
we present a relation between expander graphs and error-correcting codes. Along the way, 
we survey some of what is known about expander graphs. As we feel that these codes 
might be useful for coding on write-once media, we conclude the chapter by presenting 
some thoughts on how one might go about implementing these codes. 

In Chapter 3, we present our construction of linear-time encodable and decodable error- 
correcting codes. These codes can also be encoded and decoded in logarithmic time with 
a linear number of processors. We call these codes superconcentrator codes because their 
encoding circuits bear a strong resemblance to superconcentrators. As part of our construc- 
tion, we develop a type of code that we call an error-reducing code. An error-reducing code 
enables one to quickly remove most of the errors from a corrupted codeword, but it need 
not allow full error-correction. We construct superconcentrator codes by carefully piecing 
together appropriately chosen error-reducing codes. Again, we conclude this chapter with 
some thoughts on how one might implement these codes and on how they could be improved. 

Chapter 4 is devoted to our construction of nearly linear size holographic proofs. This 
Chapter is somewhat more involved than Chapters 2 and 3, but it should be intelligible 


to the reader with a reasonable background in theoretical computer science. We begin by 
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describing holographic proofs and by defining some of the types of error-correcting codes 
that they are related to. A potentially useful type of error-correcting code derived from 
holographic proofs is a checkable code. Checkable codes have associated randomized algo- 
rithms that, after examining only a constant number of randomly chosen bits of a received 
word, can make a good estimation of the probability that a decoder will successfully decode 
the word. One could use such an algorithm to decide whether to request the retransmis- 
sion of a word even before the decoder has worked on it. In Section 4.2, we develop the 
algebraic machinery that we will need to construct our holographic proofs. We show that 
certain polynomial codes are somewhat checkable and verifiable. These codes are used in 
Section 4.3 to construct our basic holographic proof system. We apply this system to itself 
recursively to construct our efficient holographic proofs. In the final sections, we present 
a few variations of our construction, each more powerful, but more complicated, than the 
previous. 

In the last chapter of this thesis, we explain some of the connections between our con- 
structions of error-correcting codes and holographic proofs. We explain the similarities 
between expander codes and the checkable codes derived from holographic proofs. We also 
discuss whether checkable codes are a necessary component of holographic proofs. Our 
conclusion is that the problems of constructing more efficient checkable codes, holographic 


proofs, and expander codes are strongly linked, and could even have the same solution. 
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1.4 An introduction to error-correcting codes 


The purpose of this section is to provide the reader with a convenient reference for the basic 
definitions from the field of error-correcting codes that we will use in this thesis. Those who 
understand the phrase “asymptotically good family of systematic linear error-correcting 
codes” can probably skip this section. 

Intuitively, a good error-correcting code is a large set of words such that each pair differs 
in many places. Let / be a finite alphabet with q letters. A code of length n over % is a 
subset of &”. Throughout most of Chapters 2 and 3, we will discuss codes over the alphabet 
{0,1}, which are called binary codes. Unless otherwise stated, all the codes that we discuss 
should be assumed to be binary. 

Two important parameters of a code are its rate and minimum distance. The rate of a 
code C is (log, |C|)/n. The rate indicates how much information is contained, on average, 


in each code symbol. The minimum distance of a code C is 


min, d(x, y), 

wFy 
where d(x,y) is the Hamming-distance between two words (i.e., the number of places in 
which they differ). We will usually talk about the relative minimum distance of a code, 
d(x,y)/n. 

Since we are interested in the asymptotic performance of codes, we define a family of 
error-correcting codes to be an infinite sequence of error-correcting codes that contains at 
most one code of any length. If {C;} is a family of error-correcting codes such that, for all 
t, the rate of C; is greater than r, then we say that the family has rate at least r. Similarly, 
if the relative minimum distance of each code in the family is at least 6, then we say that 
the relative minimum distance of the family is at least 6. An infinite family of codes over 
a fixed alphabet is called asymptotically good if there exist positive constants r and 6 such 
that the family has rate and relative minimum distance at least r and 6 respectively.* A 


central problem of coding theory has been to find explicit constructions of asymptotically 


4We will occasionally say good code when we mean asymptotically good code. 
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good families of error-correcting codes with as large rate and relative minimum distance as 
possible. 

The preceding definitions are standard. We will now make some less standard definitions 
that will help us discuss the complexity of encoding and decoding error-correcting codes. 


An encoding function for a code C of rate r is a bijection 


fir" 0. 


We say that an encoding function is systematic if there exist indices %,,...,2,, such that for 


all @ = (@1,.--,%n) € Y, 


(@1,---,%en) = (F(@)i,,---, f(®@)i,,,)- 


That is, the message is embedded in the codeword. If a code has a systematic encoding 
function, then we say that the code is systematic. We can think of a systematic code as 
being divided into rn “message symbols” and (1—r)n “check symbols”. The check symbols 
are uniquely determined by the message symbols, and we can view the message symbols 
as containing the information content of the codeword (in a binary code, we will call these 
message bits and check bits). 


An error-correcting function for a code C is a function 


gi" =CU {i} 


such that g(a) = 2, for all « € C. We say that an error-correcting function for C can correct 
m errors if for all ¢ € C and all y € ©” such that d(a,y) < m, we have g(y) = a. We allow 


an error-correcting function to return the value “?” 


so that it can describe the output of 
an algorithm that could not find an element of C close to its input. We use the verb decode 
loosely to indicate the process of correcting a constant fraction of errors. When we say a 
“decoding algorithm”, we mean an algorithm that can correct some constant fraction of 


errors. From the perspective of a complexity theorist, the fraction of errors that can be 


efficiently corrected in a family of error-correcting codes is much more interesting than the 
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actual minimum distance of those codes. 


1.4.1 Linear codes 


The codes that we construct will be linear codes. A linear code is a code whose alphabet 
is a field and whose codewords form a vector space over this field. There are two natural 
ways to represent a linear code: either by listing a basis of the space and defining the code 
to be all linear combinations of those basis vectors, or by listing a basis of the dual space. 
A matrix whose rows are a basis of the dual space is called a check matrix of the code. 

In a binary code, each check bit is just a sum modulo 2 of a subset of the message bits. 


Some elementary facts that we will use about linear binary codes are: 
e The zero vector is always a codeword. 


e They are systematic. To see this, row reduce the (1 —r)n x n check matrix until it 
contains (1 — r)n columns that contain exactly one 1. These columns correspond to 
the check bits, and the remaining rn columns correspond to the message bits. Observe 
that row-reducing the check matrix does not change the code that it defines. It is now 
clear that for any setting of the message bits, there is a unique setting of the check 
bits so that the resulting vector is orthogonal? to every row in the row-reduced check 


matrix. 


e They can be encoded using O(n”) work: there is a rn x (1 — r)n matrix such that 
the check bits can be computed from the message bits by multiplying the vector of 
rn message bits by this matrix. (This is the matrix that appears in the columns 


corresponding to the message bits described in the previous item.) 


e The minimum distance of a linear code is equal to the minimum weight of a non-zero 


codeword (the weight of a codeword w is d(0,w); note that d(v, w) = d(0,w — v)). 


°We say that two vectors are orthogonal if their inner product is zero. Over a finite field, this loses its 
geometric significance: a vector can be orthogonal to itself! 
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1.4.2 Asymptotic bounds 


Good linear codes are easy to construct. A randomly chosen parity check matrix defines a 


good code with exponentially high probability. 


Theorem 1.4.1 Let v,...,¥an be vectors chosen uniformly at random from G'f'(2)”. Let 
C be the code consisting of the length n vectors over GI'(2) that are orthogonal to all 
of %1,.--,¥an- C has rate at least 1 — a. With high probability, C has relative minimum 
distance at least ¢, for « < 1/2 and a > H(e), where H(-) is the binary entropy function 
(i.e. H(a) = —x log, x — (1 — x) log,(1 - 2)). 


Proof: The probability that any non-zero word is orthogonal to each of %1,..., van is 
2-°". Thus, the probability that some non-zero vector of weight at most en is orthogonal 
to each of v1,..., Van is at most 575", (7)27°". One can use Stirling’s formula to show that, 
for fixed €, log, (7) = nH(€)+ O(log n). Thus, the sum approaches zero if a > H(e). a 
Codes with rate r and minimum relative distance € for r > 1 — H(e) are said to meet the 
Gilbert- Varshamov bound. While these random linear codes may be encoded using quadratic 


work, we know of no efficient algorithm for decoding them. 


An easy upper bound on the performance of a code is the sphere-packing bound: 


Theorem 1.4.2 Let C,, be an infinite sequence of codes with relative minimum distance at 
least 6. Then 
lim sup rate(C,,) < 1— H(6/2). 


noo 
Proof: If the code has minimum distance én, then there are disjoint balls of radius 6n/2 


n 
dn/2 


2” words of length n. a 


around each codeword. These balls account for at least 2""(,”,.) words; but, there are only 


Better upper bounds than the sphere-packing bound are known. The Elias bound, which 


held the record for a long time, has a fairly simple proof. 


Theorem 1.4.3 [Elias] Let C, be an infinite sequence of codes with relative minimum distance 
at least 6. Then 


1 1 
lim sup rate(C,) <1-—H (5 —=vl- 2) . 


noo 
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Even better, but much more complicated to prove, is the McEliece-Rodemich-Rumsey- Welch 
upper bound: 


Theorem 1.4.4 [McEliece-Rodemich-Rumsey-Welch] Let C,, be an infinite sequence of codes 


with relative minimum distance at least 6. Then 


lim sup rate(C,,) < H (5 — ,/6(1- ®)) . 


noo 


Proofs of both of these can be found in [MS77]. 
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1.5 The complexity of coding 


Many of the foundations for the analysis of the complexity of algorithms were laid by Shan- 
non in his 1949 paper The synthesis of two-terminal switching circuits [Sha49]. Suggestions 
for how to analyze the complexity issues special to error-correcting codes appeared in the 
work of Savage [Sav69, Sav71] and Bassalygo, Zyablov, and Pinsker [BZP77]. For more gen- 
eral studies of the complexity of algorithms, we point the reader to [AHU74] and [CLR90]. 

From our perspective, the relevant questions are: how hard is it to encode a family of 
error-correcting codes, and how much time does it take to correct a constant fraction of 
errors? We are not as interested in the minimum distance of a code as we are in the number 
of errors that we can efficiently correct. We measure encoding and decoding efficiency by 
the time of a RAM algorithm or the size and depth of a boolean circuit that performs the 
operations. Bassalygo, Zyablov, and Pinsker point out that we should also examine the 
complexity of building the encoding and decoding programs or circuits. While we do not 
discuss this in detail, it will be clear that there are efficient polynomial-time algorithms for 
constructing our encoding and decoding programs and circuits. 

Initially, the only algorithms known for decoding error-correcting codes were exponential 
in complexity: enumerate all codewords and select one closest to the received word. For 
linear codes, one could do slightly better by computing which parity checks were violated 
and searching for the smallest pattern of errors that would violate exactly that set of parity 
checks. There were numerous improvements on these approaches. We will not attempt to 
survey the accomplishments of coding theory; we will just mention the most efficient coding 
algorithms known prior to our work: Using efficient implementations of the Finite Fourier 
Transform, Justesen [Jus76] and Sarwate [Sar77] have shown that certain Reed-Solomon 
and Goppa codes can be encoded in O(nlogn) time and decoded in time O(nlog? n). 
While these codes are not necessarily asymptotically good, one can compose them with 
good codes to obtain asymptotically good codes with similar encoding and decoding times. 
Moreover, these algorithms are easily parallelized. 

Codes that have more efficient algorithms for one of these operations have suffered in 


the other. Gelfand, Dobrushin, and Pinsker [GDP73] presented randomized constructions 
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of asymptotically good codes that could be encoded in linear time. However, they did 
not suggest algorithms for decoding their codes, and we suspect that a polynomial-time 
algorithm would be difficult to find. 

Zyablov and Pinsker [ZP76] showed that it is possible to decode Gallager’s randomly 
chosen low-density parity-check codes [Gal63] in logarithmic time with a linear number of 
processors. These codes are essentially the same as those we present in Section 2.3. We 
are not aware of any algorithm for encoding these codes that uses less than O(n”) work. 
Kuznetsov [Kuz73] used these codes to construct fault-tolerant memories. Pippenger has 
pointed out that Kuznetsov’s proof of the correctness of these memories can serve as a 
proof of correctness of the parallel decoding algorithm that we present in Section 2.3.3. 
By analyzing these codes in terms of the expansion properties of the graphs by which 
they are defined, we are able to provide a much simpler proof of the correctness of the 
parallel decoding algorithm, prove for the first time the correctness of the natural sequential 
decoding algorithm presented in Section 2.3.1, and obtain the first explicit constructions of 


asymptotically good low-density parity-check codes. 


CHAPTER 2 


Expander codes 


In this chapter, we explain a way of using expander graphs to construct asymptotically 
good linear error-correcting codes. These codes can be decoded in linear sequential time or 
parallel logarithmic time with a linear number of processors. The best encoding algorithms 
that we know for these codes are the O(n’) time algorithms that can be used for all linear 
codes. These codes fall into the category of low-density parity-check codes introduced by 
Gallager [Gal63]. The construction that we present in Section 2.5 is the first known explicit 
construction of an asymptotically good family of low-density codes. 

We begin by explaining what expander graphs are and prove that good expander graphs 
exist. In Section 2.2, we define expander codes precisely and prove that they can be asymp- 
totically good. In Section 2.3, we show how error-correcting codes derived from very good 
expander graphs can be efficiently decoded. The construction that we present in this section 
is similar to Gallager’s construction. In Section 2.3.1, we provide the first proof of correct- 
ness for the natural sequential algorithm for decoding these codes. In Section 2.3.2, we 
demonstrate that this algorithm only works on codes derived from expander graphs. Thus, 
Gallager’s codes work precisely when the graphs from which they are derived are expanders. 
Unfortunately, we are not aware of deterministic constructions of expander graphs with the 


level of expansion needed for this first construction. 


In Section 2.4, we survey some of what is known about explicit constructions of expander 
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eraphs. We use these constructions in Section 2.5 to produce explicit constructions of 
expander codes that can be decoded efficiently. 

We conclude this chapter with a discussion of how one might want to implement these 
codes and a presentation of the results of experiments which demonstrate the good perfor- 


mance of these codes. 


2.1 Introduction to expander graphs 


An expander graph is a graph in which every set of vertices has a large number of neighbors. 
It is a perhaps surprising but nonetheless well known fact that expander graphs that expand 
by a constant factor, but which have only a linear number of edges, do exist. In fact, a 
simple randomized process will produce such a graph. 

Let G = (V, FE) be a graph on n vertices. To describe the expansion properties of G, we 


say every set of size at most m expands by a factor of c if, for all sets S CV, 


|S] <m > |{y: Je € S$ such that (x,y) € E}] > e|5|. 


One can show that for all ¢ > 0 there exists a 6 > 0 such that, for sufficiently large n, a 
random d-regular graph will probably expand by a factor of d—1—€ on sets of size 6n. We 
cannot hope to find graphs that expand by a factor greater than d — 1 because it is easy 
to find sets that have this level of expansion: the vertices of a cycle in a graph does the 
trick (a graph of degree greater than two has cycles of logarithmic size). Ideally, we should 
describe the expansion of a graph by presenting a function that gives the expansion factor 
for each size of set. 

In our constructions, we will make use of unbalanced bipartite expander graphs. That is, 
the vertices of the graph will be divided into two sets such that there are no edges between 
vertices in the same set. We will call such a graph (d,c)-regular if all the nodes in one set 
have degree d and all the nodes in the other have degree c. By counting edges, we find 
that the number of d-regular vertices must differ from the number of c-regular vertices by 


a factor of c/d. We will only consider the expansion of sets of vertices contained within one 
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side of the graph. In Section 2.1.1, we will show that if c > d and one chooses a (d, c)-regular 
eraph at random, then it will have expansion approaching d— 1 from the large side to the 
small side and expansion approaching ¢ — 1 from the small side to the large side. 

We wish to point out that Alon, Bruck, Naor, Naor, and Roth [ABNt92] used expander 
eraphs in a different way to construct asymptotically good families of error-correcting codes 
that lie above the Zyablov bound [MS77]. Also, Alon and Roichman [AR94] use error- 


correcting codes to construct expander graphs. 


2.1.1 The expansion of random graphs 


In this section, we will prove upper and lower bounds on the expansion factors achieved by 
random graphs that become tight as the degrees of the graphs become large. 


We first prove a simple upper bound on the expansion any graph can achieve. 


Theorem 2.1.1 Let B be a bipartite graph between n d-regular vertices and 4n c-regular 


vertices. For all 0 < a < 1, there exists a set of an d-regular vertices with at most 
d ; 
n—(1- (1- a)*) + O(1) neighbors. 
c 


Proof: Choose a set X of an d-regular vertices uniformly at random. Now, consider 
the probability that a given c-regular vertex is not a neighbor of the set of d-regular ver- 
tices. Each neighbor of the c-regular vertex is in the set X with probability a. Thus, the 


probability that the c-regular vertex is not a neighbor of X is 


which tends to (1 — a)° as n grows large. This implies that the expected number of non- 

neighbors tends to n4(1— a)’. a 
This simple upper bound becomes tight as d grows large. 

How to choose a random (d,c)-regular graph: To choose a random (d,c)-regular 


bipartite graph, we first choose a random matching between dn “left” nodes and dn “right” 


nodes. We collapse consecutive sets of d left nodes to form the n d-regular vertices, and we 
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collapse consecutive sets of ¢ right nodes to form the 4n c-regular vertices. It is possible 
that this graph will have multiedges that should be thrown away, but this does not hurt 
the lower bound on the expansion of this graph that we will prove. 

To make our language consistent with the rest of this chapter, we will call the d-regular 


vertices “variables” and the c-regular vertices “constraints”. 


Theorem 2.1.2 Let B be a randomly chosen (d, c)-regular bipartite graph between 7 variables 
and on constraints. Then, for all 0 < a < 1, with exponentially high probability all sets of an 


variables in B have at least 


n (S(1 = (1 = a)") ~ y2dait (a)/To%s) 


neighbors, where H(-) is the binary entropy function. 


Proof: First, we fix a set of an variables, V, and estimate the probability that V’s set 
of neighbors is small. The probability that a given constraint is a neighbor of V is at least 
1—(1-—a)*. Thus, the expected number of neighbors of V is at least n4(1—(1-a)°). 
Noga Alon suggested that we form a martingale (See [AS92a]) to bound the probability 
that the size of the set of neighbors deviates from this expectation. 

Each node in V will have d outgoing edges. We will consider the process in which the 
destinations of these edges are revealed one at a time. We will let X; be the random variable 


equal to the expected size of the set of neighbors of V given that the first ¢ edges leaving V 


have been revealed. X,,...,Xdan form a martingale such that 
[Xian — Xs] <1, 
for all 0 <i< dan. Thus, by Azuma’s Inequality (See [AS92a]), 
Prob[E[X aan] — Xaan > AVdan] < e”. 


But, E|X gon] is just the expected number of neighbors of V. Moreover, X gan is the expected 


size of the set of neighbors of V given that all edges leaving V have been revealed, which is 
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exactly the size of the set of neighbors of V. 


Since there are only (") choices for the set V, it suffices to choose A so that 


mn 
an 
i) 
( Jens <<. 
an 


By Stirling’s formula, this holds for large n if A satisfies 
nH(a)/log,e < \*/2 > 2nH(a)/log,e < X. 


In general, if a graph has good expansion on a certain size set, then it will have similar 
expansion on smaller sets. However, this is not always true. What we can say is that 
the probability that sets of a certain size have a given expansion factor is a unimodal 
function. Theorem 2.1.2 shows that the probability of failure is exponentially small for large 
sets. However, for sets of constant size the probability of failure will be only polynomially 
small. One can probably show that whenever the probability that large sets have a certain 
expansion factor is exponentially small, all smaller sets will have the same expansion factor 


with high probability. 


2.2 Expander codes 


To build an expander code, we begin with an unbalanced bipartite expander graph. Say 
that the graph is (d,c)-regular between sets of vertices of size n and {n, and that c > d. We 
will identify each of the n nodes on the large side of the graph with one of the bits in a code 
of length n. We will usually refer to these n bits as variables. Hach of the 4n vertices on the 
small side of the graph will be associated with a constraint. Each constraint will restrict 
only those variables that are neighbors of the vertex identified with the constraint (See 
Figure 2-1). These will be called the “variables in the constraint”. A constraint will require 
that the variables it restricts form a codeword in some linear code of length c. Because each 
constraint we impose upon the variables is linear, the expander codes we construct will be 


linear as well. It is convenient to let all the constraints be the same. 
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constraint 
Variables restricts 
these 


Constraints 


Figure 2-1: A constraint restricts the variables that are its neighbors. 


Definition 2.2.1 Let B be a bipartite graph between n variables and an constraints that is 
d-regular on the variables and c-regular on the constraints. Let S be a code of block-length c. 
A constraint is satisfied by a setting of the variables if the variables in that constraint form a 
codeword of S. The expander code C(B,S) consists of the settings of the variables that satisfy 


every constraint. 


If the expander graph is a sufficiently good expander and if the constraints are identified 


with sufficiently good codes, then the resulting expander code will be a good code. 


Theorem 2.2.2 Let B be a bipartite graph between n variables and 4n constraints that is 
d-regular on the variables and c-regular on the constraints. Let S be a code of block-length c, 
rate r, and relative minimum distance ¢. If B expands by a factor of more than = on all sets 
of size at most an, then C(.B,S) has rate at least dr — (d — 1) and relative minimum distance 


at least a. 


Proof: To obtain the bound on the rate of the code, we will count the number of linear 
restrictions imposed by the constraints. Each constraint induces (1 —1r)c linear restrictions. 
Thus, there are a total of 


nX(1 —r)c=dn(1—-r) 
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linear restrictions, which implies that there are at least n(dr — (d — 1)) degrees of freedom. 

To prove the bound on the minimum distance, we will show that there can be no non- 
zero codeword of weight less than an. Let w be a non-zero word of weight at most an 
and let V be the set of variables that are 1 in this word. There are d|V| edges leaving the 
variables in V. The expansion property of the graph implies that these edges will enter 
more than 4\V| constraints. Thus, the average number of edges per constraint will be less 
than ce, so there must be some constraint that is a neighbor of V, but which has a number 
of neighbors in V that is less than the minimum distance of S. This implies that w cannot 


induce a codeword of $ in that constraint; so, w cannot be a codeword in C(B,S). | 


Remark 2.2.3 A construction of codes defined by identifying the nodes on one side of a 
bipartite graph with the bits of the code and identifying the nodes on the other side with subcodes 
first appeared in the work of Tanner [Tan81]. Following Gallager’s lead, Tanner analyzed the 
performance of his codes by examining the girth of the bipartite graph. Margulis [Mar73] also 
used high-girth graphs to construct error-correcting codes. Unfortunately, it seems that analysis 
resting on high-girth is insufficient to demonstrate that families of codes are asymptotically 


good. 


2.3 A simple example 


A simple example of expander codes is obtained by letting B be a graph with expansion 


4 


greater than 5 


on sets of size at most an, and letting S be the code consisting of words of 


even weight. The parity-check matrix of the resulting code, C(B,S), is just the adjacency 


matrix of B. The code S has rate —* and minimum relative distance 2, so C(B,S) has 


rate 1 — 4 and minimum distance at least an. 

To obtain a code that we can decode efficiently, we will need even greater expansion. 
With greater expansion, small sets of corrupt variables will induce non-codewords in many 
constraints. By examining these “unsatisfied” constraints, we will be able to determine 
which variables are corrupted. In Sections 2.3.1 and 2.3.3, we will explain how to decode 


these simple expander codes. 
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Unfortunately, we do not know of explicit constructions of expander graphs with ex- 
pansion greater than g, Thus, in order to construct these simple codes, we must use the 


randomized construction of expanders explained in Section 2.1. 


2.3.1 Sequential decoding 


There is a natural algorithm for decoding these simple expander codes. We say that a 
constraint is “satisfied” by a word w if the sum of the values that w assigns to the variables 
in the constraint is even; otherwise, we call the constraint “unsatisfied”. Consider what 
happens when we flip! a variable that is in more unsatisfied than satisfied constraints. The 
unsatisfied constraints containing the variable become satisfied, and vice versa. Thus, we 
have decreased the total number of unsatisfied constraints. The idea behind the sequential 
decoding algorithm is to keep doing this until no unsatisfied constraints remain, in which 
case we have a codeword. Theorem 2.3.1 says that if the graph used to define the code 
is a good expander and if not too many variables of a codeword are corrupted, then this 
algorithm will succeed. 


Sequential expander code decoding algorithm: 


e If there is a variable that is in more unsatisfied than satisfied constraints, then flip 


the value of that variable. 
e Repeat until no such variables remain. 


It is easy to implement this algorithm so that it runs in linear time (assuming that 
pointer references have unit cost). In Figure 2-2, we present one such way of implementing 
this algorithm. We assume that the graph has been provided to the algorithm as a graph 
of pointers in which each constraint points to the variables it contains, and each variable 
points to the constraints in which it appears. The implementation runs in two phases: 
a set-up phase that requires linear time, and then a loop that takes constant time per 
iteration. During the set-up phase, the variables are partitioned into lists by the number of 


unsatisfied constraints in which they appear. During normal iteration of the loop, a variable 


‘Tf the variable was 0, make it 1. If it was 1, make it 0. 
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Set-up phase 


For each constraint, compute the parity of the sum of the variables it contains. (The 
algorithm should have a list of the constraints.) 


Initialize lists Do,...,L. 


For each variable, count the number of unsatisfied constraints in which it appears. 
If this number is 7, then put the variable in list L;. 


Loop 
Until lists Lpajo],..-, La are empty do: 
Find the greatest ¢ such that £; is not empty 
Choose a variable v from list L; 
Flip the value of variable v 
For each constraint c that contains variable v 
Update the status of constraint c 


For each variable w in constraint c 


Recompute the number of unsatisfied constraints in which w appears. Move it 
to the appropriate list. 


If all variables are in list Ly, the output the values of the variables. Otherwise, 
report “failed to decode”. 


Figure 2-2: An implementation of sequential expander code decoding algorithm. 


that appears in the greatest number of unsatisfied constraints is flipped; the status of each 
constraint that contains that variable is updated; and each variable that appears in each of 
those constraints is moved to the list that reflects its new number of unsatisfied constraints. 
If, at some point, there is no variable in more unsatisfied than satisfied constraints, the 
implementation leaves the loop and checks whether it has successfully decoded its input. 
If all the variables are in the list Ly, then there are no unsatisfied constraints and the 
implementation will output a codeword. We will show that the loop is executed at most a 


linear number of times. 


Theorem 2.3.1 Let B bea bipartite graph between n variables of degree d and an constraints 
of degree c such that all sets X of at most an variables have at least (3+ ¢)d|X| neighbors, for 
some € > 0. Let C(.B) be the code consisting of those settings of the variables that cause every 
constraint to have parity zero. Then the sequential decoding algorithm will correct up to an 
a/2 fraction of errors while executing the decoding loop at most dan/2 times. Moreover, the 


algorithm runs in linear time on all inputs, regardless of whether or not B is a good expander. 
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Proof: We will say that the decoding algorithm is in state (v,u) if v variables are 
corrupted and u constraints are unsatisfied. We view u as a potential associated with v. 
Our goal is to demonstrate that the potential will eventually reach zero. To do this, we 
will show that if the decoding algorithm begins with a word of weight at most an/2, then, 
at every step, there will be some variable with more unsatisfied neighbors than satisfied 
neighbors. 

First, we consider what happens when the algorithm is in a state (v,w) with v < an. 
Let s be the number of satisfied neighbors of the corrupted variables. By the expansion of 
the graph, we know that 

u+s> (3+6) dv. 


Because each satisfied neighbor of the corrupted variables must share at least two edges with 


the corrupted variables, and each unsatisfied neighbor must have at least one, we know that 
dv > ut 2s. 
By combining these two inequalities, we obtain 
1 1 
s< a7 é dv and ud 5 t 2 dv. (2.1) 


Since each unsatisfied constraint must share at least one edge with a corrupted variable, and 
since there are only dv edges leaving the corrupted variables, we see that at least a (5 + 2e€) 
fraction of the edges leaving the corrupted variables must enter unsatisfied constraints. 
This implies that there must be some corrupted variable such that a (4+ 2e) fraction of its 
neighbors are unsatisfied. Of course, this does not mean that the decoding algorithm will 
decide to flip a corrupted variable. 

However, it does mean that the only way that the algorithm could fail to decode is if 
it flips so many uncorrupt variables that v becomes greater than an. Assume by way of 
contradiction that this happens. Then there must be some time at which v equals an. At 
this time, equation (2.1) tells us that u > $an. This leads to a contradiction because wu is 


initially at most fan and can only decrease during the execution of the algorithm. 
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This would imply that the algorithm was in a state (an,u), where u < fan, because wu 
was initially at most fan. But, this would contradict the analysis above. 

To see that the implementation of the sequential decoding algorithm runs in linear time, 
first observe that the degree of every variable and constraint is constant; so, the set-up phase 
requires linear time. Moreover, every time a variable is flipped by the algorithm, the number 
of unsatisfied constraints decreases. Thus, the loop cannot be executed a number of times 
greater than the number of constraints. Because the degree of each variable and constraint 


is constant, each iteration requires only constant time. a 


We note that it is possible to improve the constants in this analysis by taking into 
account the fact that after each decoding step, the number of corrupted variables actually 


decreases by at least 2° — 


2.3.2 Necessity of Expansion 


We will now show that the sequential decoding algorithm works only if the graph B is an 


expander graph. 


Theorem 2.3.2 Let B bea bipartite graph between n variables of degree d and an constraints 
of degree c such that the sequential expander code decoding algorithm successfully decodes all 


sets of at most an errors in the code C(B). Then, all sets of an variables must have at least 


( jdal 
an 1+ se] 
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neighbors in B. 


Proof: We will first deal with the case in which d is even. In this case, every time a 
variable is flipped, the number of unsatisfied constraints decreases by at least 2. Consider the 
performance of the decoding algorithm on a word of weight an. Because the algorithm stops 
when the number of unsatisfied constraints reaches zero, the algorithm must decrease the 
number of unsatisfied constraints by at least 2an as it corrects the an corrupted variables. 


Thus, every word of weight an must cause at least 2an constraints to be unsatisfied, so 
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every set of an variables must have at least 2an neighbors. Because we assume that c > d, 


jd-1 
2>1+ 7a 


2¢ 


and we are done with the case in which d is even. 

When d is odd, we can only guarantee that the number of unsatisfied constraints will 
decrease by 1 at each iteration. This means that every set of an variables must induce at 
least an unsatisfied constraints. Alone, this is insufficient to demonstrate expansion by a 
factor greater than 1. However, let us consider what must happen for the algorithm to be 
in a state in which an variables are corrupted, but there is no variable that the decoding 
algorithm can flip that will cause the number of unsatisfied constraints to decrease by more 
than 1. This means that each corrupted variable has at least ot of its edges in satisfied 
constraints. Because each satisfied constraint can have at most c incoming edges, this 
implies that there must be at least an satisfied neighbors of the an variables. Thus, the 
set of an variables must have at least an(1 + 4+) neighbors. 

On the other hand, if the algorithm decreases the number of unsatisfied constraints by 
more than 1, then it must decrease the number by at least 3. For some word of weight an, 
assume that the algorithm flips Gan variables before it flips a variable that decreases the 
number of unsatisfied constraints by only 1. The original set of an variables must have had 


at least 


3Ban + (1—- B)an 


neighbors. On the other hand, once the algorithm flips a variable that causes the number of 
unsatisfied constraints to decrease by 1, we can apply the bound of the previous paragraph 


to see that the variables must have at least 
d—1 
1- 14+— 
(1 = Ban (1+ >) 


neighbors. We note that this bound is strictly decreasing in 3, while the previous bound 


is strictly increasing in $8, so the lower bound that we can obtain on the expansion occurs 
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when 3 is chosen so that 


3Gan+(1—Bjan = (1+ 4)(1-B)an = 
1426 = (1-8)(1+ >) 
G+) = = 


When we plug £ back in, we find that the set of an variables must have at least 


d-1 d-1 3434 
an {3 Sz) +(1- 2c )) = on (SE 
( (; + x 3 + - 3 + = 
( jd-1 
antit+ 2c 
34+ 4+ 


neighbors. a 


2.3.3. Parallel decoding 


The sequential decoding algorithm has a natural parallel analogue: in parallel, flip each 
variable that appears in more unsatisfied than satisfied constraints. We will see that this 
algorithm can also correct a constant fraction of errors if the code is derived from a suffi- 
ciently good expander graph. 


Parallel expander code decoding algorithm: 
e In parallel, flip each variable that is in more unsatisfied than satisfied constraints. 


e Repeat until no such variables remain. 


Theorem 2.3.3 Let B bea bipartite graph between n variables of degree d and 4n constraints 
of degree c such that all sets X of at most aon variables have more than (3+¢)d|X | neighbors, for 
some € > 0. Let C(.B) be the code consisting of those settings of the variables that cause every 
constraint to have parity zero. Then, C(B) has rate at least (1 — £) and the parallel expander 


ao(1+4e 
2 


code decoding algorithm will correct any a < ) fraction of errors after log) /(1-26)(”) 


decoding rounds, where each round requires constant time. 
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Proof: Assume that the algorithm is presented with a word of weight at most an, where 


ao(1+4e) 


a< a 


. We will refer to the variables that are 1 as the corrupted variables, and we 
will let S denote this set of variables. We will show that after one decoding round, the 
algorithm will produce a word that has at most (1 — 2e)an corrupted variables. 

To this end, we will examine the sizes of F’, the set of corrupted variables that fail to flip 
in one decoding round, and C, the set of variables that were originally uncorrupt, but which 
become corrupt after one decoding round. After one decoding round, the set of corrupted 


variables will be CU F’. Define v = ||, and set ¢, y, and é so that |F'| = dv, |C| = yv, and 
|N(5')| = 6dv. By expansion, 6 > # +. To prove the theorem, we will show that 
1 


4 1 —2e. 
et+y< 4e* € 


i 
4 


We will first show that 
o<4—46. 


We can bound the number of neighbors of S by observing that each variable in F’ must 
share at least half of its neighbors with other corrupted variables. Thus, each variable in 
F can account for at most 2d neighbors, and, of course, each variable in $ \ F’ can account 


for most d neighbors, which implies 


édv < Tdgv + d(1— dv > 
ob < 4-46. 
We now show that 
6-(2+.€ 
7< if Gt9 ) 
até 


é-(3 +0) 


Assume by way of contradiction that this is false. Let C’ be a subset of C of size Ts 


v. 
Each variable in C’ must have at least 4 edges that land in constraints that are neighbors 


of S. Thus, the total number of neighbors of C’U S is at most 


d 
S|C'| + bv. 
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But, because the set C’ US has size at most 


6—(3 144 
(+ S422) ( + =) aon 
ate 2 


it must have expansion greater than d(2 + €), which implies that 


d ! d 6-(G+e) 


Fe )ul oats): 


After simplifying this inequality, we find that it is a contradiction. Thus, we find 


V 


6-G+9 
4 — 46 + —_4+—— 
ery < + Th 
ol (24+) +4e 466 
re 
ge +4e(1— 6) 
= a 
c = —€+4e(4) 
ste 
i 
_ 4 
rte 
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We note that this algorithm can be implemented by a circuit of size O(nlogn) and 


depth O(log n). 


2.4 Explicit constructions of expander graphs 


In order to create explicit constructions of expander codes, we will need to make use of 


explicit constructions of expander graphs. In this section, we will survey some of what 
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is known about explicit constructions of expander graphs. The only facts in this section 
that are necessary for the results of this Chapter are Lemma 2.4.6, Lemma 2.4.14, and 
Theorem 2.4.7. In Chapter 3, we will also make use of Proposition 2.4.19. 

The expansion properties of a graph are strongly related to its eigenvalues. We recall 
their definition: 

Let G be a graph on n vertices. The adjacency matrix of G is the n-by-n matrix, A, 


with entries (a; ;) such that 


1, if (¢,7) is an edge of G, and 
ij = 


0, otherwise. 


When we say “the eigenvalues of G”, we mean the eigenvalues of this matrix. 


Remark 2.4.1 Some researchers normalize the entries in the adjacency matrix so that its 
eigenvalues have absolute value at most 1. It is also common to study the Laplacian of a 
graph—the matrix that has the degree of vertex 7 as the (7,7)-th entry, —1 as the (i, 7)-th 
entry if 2 and 7 are adjacent, and 0 otherwise. This matrix has the advantage of being positive 


semi-definite. 


We will now restrict our discussion to graphs that are regular of some degree d. Note 
that d will be an eigenvalue of these graphs, corresponding to the eigenvector that is 1 in 
each entry. It is fairly easy to show that the multiplicity of the eigenvalue d will be equal 
to the number of connected components of G. Henceforth, we will only consider connected 
eraphs G. It is again fairly easy to show that —d will be an eigenvalue if and only if the 
eraph is bipartite. All other eigenvalues must lie between —d and d. 

A graph will be a good expander if all eigenvalues other than d have small absolute 
value.” For example, a large separation of eigenvalues implies that the number of edges 
between two sets of vertices is roughly the number expected if the sets were chosen at 


random: 


Theorem 2.4.2 [Beigel-Margulis-Spielman] Let G = (V, /) be a d-regular connected graph 


?The analogous statement holds for bipartite graphs if all eigenvalues other than d and —d are small, but 
we will not discuss that situation here. 
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on 7 vertices such that every eigenvalue other than d has absolute value at most A. Let X and 


Y be subsets of V of sizes an and (n respectively. Then 
|\{(2,y) EX x Vz (w,y) € B}] = dain] < ndy/(a = 07)(3 = 9). 


Proof: Let A be the adjacency matrix of the graph G, and let ¥ and ¥ be the charac- 
teristic vectors of the sets X and Y respectively (i.e. the vector that is 1 for each vertex in 
the set, and 0 otherwise). 

We will compute (A, 7) in two ways, where (-,-) denotes the standard inner product of 
vectors. We first observe that this term counts the number of edges from vertices in X to 
vertices in Y. (If X and Y intersect, then each edge in their intersection is counted twice 


as it can be viewed as going from X to Y in two ways.) So, 
(Ato) = Ka, y) EX XY: (ay) € EX. 


Let ¢ = #—al and let @ = ¥— BI. A simple calculation reveals that # and @ are 
perpendicular to I. Because A is a symmetric matrix, we also know that Ad is perpendicular 
to I. 


We compute 


(AF,g) = (A(é+al),(w+ 1)) 


because A@ is perpendicular to I, and w is perpendicular to AI. Recall that for two vectors 


a and b, (a,b) < \/(a, a) (b, 6). We combine this with the observations that 


(8) = a(l—a)n, 
(6,6) = B(1—B)n, and 
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vis perpendicular to the eigenvector corresponding to d to obtain 


(AB, @)| < Anyla—o2)(G- 8) = 
(AF, f) — dnosl < Anya 02)(8 — 8), 7 


We will use this result to derive Tanner’s theorem [Tan84], which shows that if X is 


small, then G is an expander. 


Theorem 2.4.3 [Tanner] Let G be a d-regular graph on n vertices such that all eigenvalues 
other than d have absolute value at most A. Then, for any subset X of the vertices of G, the 


neighbors of X, N(X), must have size 


2 
Iwexy> — Oe 
N24 (a? _ \2) I 


Proof: Let X have size an. Let Y be the set of non-neighbors of X, and assume Y has 


size On. Because there are no edges between X and Y, Theorem 2.4.2 implies that 


dafn < nrA/(a-—a?)(G-— $7?) => 


@as < dAM(1-a)\(1—-8) => 
B(a(d? —r*7)+ A") < A(1-a) > 
1-6 > ad 


Remark 2.4.4 One can similarly show that a d-regular bipartite graph will be a good expander 
if every eigenvalue other than d and —d is small. These results do not actually depend on the 
graph being d-regular. One can prove similar statements so long as there is a separation between 


the first and second largest eigenvalues. 


Remark 2.4.5 Kahale [Kah93b] improves this theorem by showing that graphs with a similar 


eigenvalue separation have even better expansion. 
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The property of expander graphs that will be most useful to us in this dissertation is 


that small subsets of vertices in an expander have very small induced subgraphs: 


Lemma 2.4.6 [Alon-Chung [AC88]] Let G be a d-regular graph on n vertices such that all 
eigenvalues other than d have absolute value at most A. Let X be a subset of the vertices of G 


of size yn. Then, the number of edges contained in the subgraph induced by X in G is at most 
n A 
<d(y? + —y(1- . 
5 € + 57 ») 


Proof: — Follows from Theorem 2.4.2 by setting Y = X and observing that we count every 


edge of the induced subgraph twice. a 


It is natural to ask how large a separation can exist between the eigenvalue d and 
the next-largest eigenvalue. Lubotzky, Phillips and Sarnak [LPS88] and, independently, 
Margulis [Mar88] provide an explicit construction of infinite families of graphs in which 


there is a very large separation. 


Theorem 2.4.7 [Lubotzky-Phillips-Sarnak, Margulis] For every pair of primes p,q congruent 
to 1 modulo 4 such that p is a quadratic residue modulo g, there is an easily constructible 


(p + 1)-regular graph with q(q? — 1)/2 vertices such that the second-largest eigenvalue of the 


graph is at most 2,/p. 


We will hereafter refer to these graphs as LPS-M graphs. 


Remark 2.4.8 LPS-M graphs are Cayley graphs of PS'L(2,Z/qZ). One can find a repre- 
sentation of these graphs so that the names of the neighbors of a vertex can be computed in 


poly-logarithmic time from the name of that vertex. 


Alon and Boppana [Alo86] have proved that the separation of eigenvalues achieved by 


LPS-M graphs is optimal: 


Theorem 2.4.9 [Alon-Boppana] Let G,, be a sequence of k-regular graphs on n vertices and 


let A,, be the second-largest eigenvalue of G',. Then 


lim sup A, > 2Vk—-1. 


noo 


46 Expander codes 


Alon [Alo86] has proved that every expander graph must have some separation of eigen- 


values. 


Theorem 2.4.10 Let G be a d-regular graph on n vertices such that for all sets S of size at 


most n/2, 


IN(S)} 2 +e) [5]. 


Then, all eigenvalues of G but d have absolute value at most 


Ce 


Ate 


LPS-M graphs will suffice for our constructions in Section 2.5. However, a more general 


class of graphs will suffice as well. The following definition captures this class of graphs: 


Definition 2.4.11 We will say that a family of graphs G is a family of good expander graphs 


if G contains graphs G,, 4 of m nodes and degree d so that 


e For an infinite number of values of d, there exists an infinite number of graphs G', 4 € G, 


and 


e for each of these values d, the second-largest eigenvalues of G,, 4 are bounded from above 


by constants \, such that limg.. Aq/d = 0. 


Pippenger [Pip93] points out that we can obtain a family of good expander graphs 
by exponentiating the expander graphs constructed by Gabber and Galil. Gabber and 
Galil [GG81] construct an infinite family of 5-regular expander graphs. By Theorem 2.4.10, 
all the eigenvalues of these graphs other than 5 must be bounded by some number A < 5. 
Consider the graphs obtained by exponentiating the adjacency matrices of these graphs 
k times. They will have degree 5*, and all eigenvalues but the largest will have absolute 
value at most A*. So, these yield a family of good expander graphs. These graphs will 
have multiple edges and self-loops, but this is irrelevant in our applications. Moreover, one 
could remove the multiple edges and self-loops without adversely affecting the quality of 


the expanders. 
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To construct unbalanced bipartite expander graphs, we will take the edge-vertex inci- 
dence graphs of ordinary expanders. From a d-regular graph G' on n vertices, we will derive 


a (2,d)-regular graph with dn/2 vertices on one side and n vertices on the other. 


Definition 2.4.12 Let G be a graph with edge set EF and vertex set V. The edge-vertex 


incidence graph of G is the bipartite graph with vertex set F’ UV and edge set 
{(e,v) € Ex V : vis an endpoint of e}. 


Our analyses of these graphs will follow from Lemma 2.4.6. 
For some constructions, we will want graphs of higher degree. To construct these, we 


use a more general construction due to Ajtai, Komldés and Szemerédi [AKS87]: 


Definition 2.4.13 Let G be a graph with edge set F’ and vertex set V. The k-path-vertex 
incidence graph of G Is a bipartite graph between the set of paths of length & in G and the 
vertices of G in which a vertex of G' is connected to each path on which it lies. (A path of 


length & is a sequence of vertices v,...,V,41 such that (v;,v;41) € F foreach 1 <i <k.) 


We analyze the expansion properties of these graphs through the following fact proved by 
Kahale [Kah93a]. 


Lemma 2.4.14 [Kahale] Let G’, ; be a d-regular graph on n nodes with second-largest eigen- 
value bounded by \. Let 5S' be a subset of the vertices of G,, 4 of size yn. Then, the number of 


paths of length & contained in the subgraph induced by S' in G,, 4 is at most 
ynd® (y+ A1—7))". 


A proof of this fact can also be found in [AF WZ]. The difference of a factor of two between 
Lemma 2.4.14 when & = 1 and Lemma 2.4.6 is due to the fact that in Lemma 2.4.14 each 


path is being counted twice: once for each endpoint. 
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2.4.1 Expander graphs of every size 


It is natural to wonder how often the conditions of Theorem 2.4.7 are met. By the law 
of quadratic reciprocity, p is a quadratic residue modulo gq if and only if g is a quadratic 
residue modulo p, for p and qg primes congruent to 1 modulo 4. Thus, for a given p, we 
can bound the gaps between successive q’s that satisfy the conditions of Theorem 2.4.7 by 
bounding the sizes of gaps between successive primes in arithmetic progressions. One can 


do this using the following theorem of Heilbronn [Hei33]: 


Theorem 2.4.15 [Heilbronn] Let 7(2;k,/) denote the number of primes less than 2 and 


congruent to / modulo k. Then, there exists a constant a < 1 such that 


. a. ; Inez 1 
jim. (m(a+ v 3k, 1) — m(a3k,1))—— _ Ak)’ 


where / is relatively prime to & and ¢(k) denotes the number of positive integers less than and 


relatively prime to k. 


Definition 2.4.16 Let G = {Gi}, be a sequence of graphs such that G; has n,; vertices, 


and n+, > n;. We say that G is dense if nj4, — n; = 0(n;). 


Theorem 2.4.15 implies that LPS-M graphs are dense. It is not difficult to see that the 
graphs obtained by exponentiating the expander graphs of Gabber and Galil are dense as 
well. 

A dense family provides expander graphs of size close to almost every number; but, we 
will need good expanders of every sufficiently large number of vertices for our constructions 
in Chapter 3. We build such a family from a dense family by removing a few nodes from 


the expanders in such a way that they remain good expanders. 


Definition 2.4.17 Let G be a graph. If S is a subset of the vertices of G such that no two 
vertices of 5S are neighbors or share a common neighbor, then we call S a doubly independent 


set of vertices. 


If G is a d-regular graph with n vertices, then it is easy to see that it will have a doubly- 


independent set of at least n/(d? + 1) vertices. Moreover, we can show that if a doubly- 
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independent set of vertices is removed from a good expander graph, then it remains a good 


expander graph. 


Proposition 2.4.18 Let G be a d-regular graph on n vertices such that all eigenvalues but 
the largest have absolute value at most A. Let G’ be a graph obtained from G by removing a 
doubly-independent set of vertices. Then, for any subset X of the vertices of G’, the neighbors 
of X, N(X), must have size 


[N(X)| > |X| (ae?) 


Proof: We will show that the set of neighbors of X in G’ can be smaller than its set 
of neighbors in G by at most one neighbor for each vertex in X. Let S denote the set of 
vertices that were removed from G to obtain G’. The only vertices that could be neighbors 
of X in G but not in G’ are those that are in S. However, if there are two vertices in S' 
that are both neighbors of one vertex in X, then S could not be a doubly independent set. 


Thus, the proposition follows from Theorem 2.4.3. | 


Proposition 2.4.19 Let G be a d-regular graph on n vertices such that all eigenvalues but 
the largest have absolute value at most 1. Let G’ denote a graph that can be obtained from G 
by removing a doubly-independent set of vertices, 5. Let X be a subset of the vertices of G’ of 


size yn. Then, the number of edges contained in the subgraph induced by X in G is at most 
n A 
<d (7? +=7(1-7)}. 
5 (; + 57 ) 

Moreover, each vertex of G’ has degree either d or d — 1. 


Proof: — Follows from Lemma 2.4.6 because the subgraph induced by X in G’ is the same 
as the subgraph induced by X in G. | 


Remark 2.4.20 Not only can we find expander graphs with any number of vertices, but we 
can find expander graphs with any number of edges as well. We need merely choose a graph 


that has a few more edges than we desire, and remove a few of them. It is easy to remove edges 
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so that each vertex still has degree d or d — 1 and the graph has expansion properties similar to 


those obtained in Propositions 2.4.18 and 2.4.19. 


For more information about expander graphs and algebraic graph theory, we recommend: 
e Nabil Kahale. Exander Graphs. PhD thesis, M.I.T., September 1993. 


e Norman Biggs. Algebraic Graph Theory. Cambridge University Press, New York, NY, 
second edition, 1993. 


e Dragos M. Cvetkovic, Michael Doob, and Horst Sachs. Spectra of graphs : theory and 
application. Academic Press, New York, 1990. 


2.5 Explicit constructions of expander codes 


Let G be a d-regular graph in which every eigenvalue but d has absolute value at most A. 
Let B be the edge-vertex incidence graph of G'. We will construct codes of the form C(.B,S), 
where S is itself a fairly good code of constant block length (recall Definition 2.2.1). The 
reason that we will be able to decode these codes is that when a constraint is unsatisfied, we 
know much more than just this bit of information. Usually, there is a codeword of S that is 
closest to the word described by the variables in the constraint. If this word is sufficiently 
close to the codeword, then the algorithm will flip those variables that must be changed in 
order to obtain that codeword. 


Assume that S has minimum relative distance e. 


Parallel explicit expander code decoding round 


e In parallel, for each constraint, if the setting of the variables in that constraint is 
within ¢/4 of a codeword, then send a “flip” message to every variable that needs to 


be flipped to obtain that codeword. 
e In parallel, every variable that receives the message “flip”, flips its value. 


Remark 2.5.1 One parallel explicit expander code decoding round can be performed by O(7) 
processors in constant time. Similarly, one could implement this as a linear-size circuit of 


constant depth. 
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Lemma 2.5.2 Let S bea linear code with rate 7, minimum relative distance € and block-length 
d. Let G be a d-regular graph such that all eigenvalues but d have absolute value at most 1. 
Let B be the edge-vertex incidence graph of G. Then C(B,S) has rate 2r — 1 and minimum 
relative distance at least 


A 
b= 7° +(¥—7°)5, where 
qyd+(1—y)A < ed. 


Moreover, if 
ra 5 €e A 
768° +16 ad’ 


then the iteration of O(log 7) parallel explicit expander code decoding rounds will correct a < 


fraction of errors. This algorithm can be simulated on a sequential machine in linear time. 


Proof: Lemma 2.4.6 implies that B expands by a factor of at least 


yn 
nd (42 4 A(y—-+?)) 


on sets of variables of size 


me G + XO - 7) 
Using this bound, the bounds on the rate and minimum distance of C(B, 4,51) follow im- 
mediately from Theorem 2.2.2. 

Let X be a set of at most and variables. We will show that if a decoding round is given 
a word that is 1 for all v € X, and 0 elsewhere, then it will output a word whose weight is 
less by a constant factor, provided that a is not too big. 

We will say that a constraint is confused if it sends a “flip” message to a variable that 
is not in X. In order for this to happen, the constraint must have at least ed variables of 
X as neighbors. Each variable of X can be a neighbor of two constraints, so there can be 
at most 


d 
at +2 Jan 


ed 3€ 
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confused constraints. Each of these can send at most 7d “flip” signals, so at most 


Jan ca nd 2a 
3e 4 2 3 


variables not in X will receive “flip” signals. 


We will call a constraint unhelpful if it has more than 7 neighbors in X. A variable in 


X will flip unless both of its neighbors are in unhelpful constraints. There are at most 


an +2 dan 
7 = 
qed € 


unhelpful constraints, which by Lemma 2.4.6 can have at most 
nd (42) in (**) d 
2 € e/d 
edges among them. So, the weight of the output codeword must decrease by at least 
nd 2a (=) > da X 
— a =e — —_ = - — 
2 3 € e ad 


can 
64? 


“(= € *) 
2 \ 768 16 d/- 


Thus, the weight of the output codeword must decrease by a constant factor. 


The worst case is when a is maximized. If we set a = then we obtain 


To simulate this algorithm in linear time, one should maintain a list of the unsatisfied 
constraints. When simulating any round after the first, the algorithm should only consider 
the variables that appear in those constraints. We have proved that the sizes of these sets 
of variables has a bound that decreases by a constant factor with each iteration, so the total 


time required to simulate the O(log n) rounds will be linear. a 


To obtain an infinite family of good codes, we will use a family of good expander graphs 


and good codes known to exist by Theorem 1.4.1. 


Theorem 2.5.3 For all « such that 1 —2H(e) > 0, where /(-) is the binary entropy function, 
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there exists an easily constructible family of linear codes with rate 1 — 2H(€) and minimum 
relative distance arbitrarily close to ¢? such that the parallel decoding algorithm will decode up 
to a & fraction of errors using O(log 7) parallel explicit expander code decoding rounds. This 


64 


decoding algorithm can be simulated in linear time on a sequential machine. 


Proof: Any family of good expander graphs will suffice. For example, we could use those 
known to exist by Theorem 2.4.7. Let G,,¢ denote a graph of degree d on n vertices from 
this family, if such a graph exists. Let Ag be an upper bound on the eigenvalues other than 
d in the graphs G, 4. Let B,.q¢ denote the edge-vertex incidence graph of Ga. 

As we let d grow large in Theorem 2.5.2, the Ag terms tend to zero. Moreover, Theo- 
rem 1.4.1 tells us that, for sufficiently large block-length d, there are linear codes of rate r 
and minimum relative distance ¢€ ifr < 1— H(e). We use such a code as the subcode in the 


construction of Theorem 2.5.2. | 


2.5.1 A generalization 


Noga Alon has pointed out that the “edge to vertex” construction that we use to con- 
struct expander codes is a special case of a construction due to Ajtai, Komlés and Sze- 
merédi [AKS87]. They construct unbalanced expander graphs from regular expander graphs 
by identifying the large side of their graph with all paths of length & in the original graph 
and the small side of their graph with the vertices of the original graph. A node identified 
with a path is connected to the nodes identified with the vertices along that path. The 
construction used in Theorem 2.5.2 is the special case in which & = 1. 

Alon suggested that the codes produced by applying the technique of Theorem 2.5.3 to 
this more general class of graphs can be analyzed by applying Lemma 2.4.14. 


Theorem 2.5.4 For all integers k > 2 and all € such that 1—kH(e) > 0, there exists an easily 
constructible family of linear codes with rate 1—k H(€) and minimum relative distance arbitrarily 
close to e'+1/(*-)) such that a parallel decoding algorithm will decode some O(e!*+!/(4-1)) 
fraction of errors in O(log) parallel decoding rounds, each of which takes constant time. 


Moreover, this algorithm can be implemented to run in linear time on a sequential machine. 
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Proof: [Sketch] As in Theorem 2.5.2, we will use the graphs known to exist by Theo- 
rem 2.4.7. We will form a code by using the k-path-vertex incidence graph of G,,»4; and 
by using a Gilbert-Varshamov good code as the subcode. Theorem 1.4.1 implies that for 
sufficiently large block-length p+ 1, there exist linear codes of rate r and minimum relative 
distance € provided that r < 1— H(e). Moreover, as p+ 1 grows large, the terms involving 
A in Lemma 2.4.14 goes to zero. If this term were zero, then we would know that every set 
containing an e® fraction of the variables has at least an e# fraction of the constraints 
as neighbors. Because there are nd* variables of degree k and n constraints of degree kd*, 
Theorem 2.2.2 would imply that no word of weight up to c® can be a codeword. However, 
because A never actually reaches zero, we can only come arbitrarily close to this bound. 

To decode these codes, we modify the parallel explicit expander code decoding algorithm 
so that each constraint sends a “flip” message to its variables only if the setting of its 
variables is within ¢/h of a codeword, for some constant h depending on k. An analysis 
similar to that in the proof of Theorem 2.5.2 shows that if an a fraction of the variables 
were corrupted at the start of a round, then at most an 

a ah (a (1-28) 

fraction of the variables will be corrupt after the decoding round. We can choose a to be 
some fixed fraction of e!t+!/* and a value for h so that this term is less than a. 


The idea behind the sequential simulation of this algorithm is the same as appeared in 


the proof of Theorem 2.5.2. | 


2.6 Notes on implementation and experimental results 


We imagine using expander codes in coding situations where long block length is required. 
For small block lengths, special codes are known that will provide better performance. 
Expander codes should be especially useful for coding on write-once media, such as Compact 
Discs, where fast decoding is essential, the time required to encode is not critical, and codes 


of block length roughly 10,000 are already being used [Ima90]. 
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If one intends to implement expander codes, we suggest using the randomized construc- 
tion presented in Sections 2.1.1 and 2.3. We obtained the best performance from graphs 
that had degree 5 at the variables. It might be possible to get good performance with 
randomly chosen graphs and slightly more complex constraints, but we have not observed 
better performance from these. 

One drawback of using a randomly chosen graph to generate an expander code is that, 
while the graph will be a good expander with high probability, we know of no polynomial 
time algorithm that will certify that a graph has the level of expansion that we need for 
this construction.* On the other hand, it is very easy to perform experiments to test the 
performance of an expander code and thereby weed out those that do not work well on 
average. 

We now mention a few ideas that we hope will be helpful to those interested in imple- 


menting these codes. 


e When one chooses a random graph as outlined in Section 2.1.1, there is a good chance 
that the graph produced will have a “double edge”. That is, two edges between the 
same two vertices. We suggest using a simple heuristic to remove one of these edges, 
such as swapping its endpoint with that of a randomly chosen edge. If one is choosing 
a relatively small graph, say of 2,000 nodes, then there is a fair chance that there will 
be two variables that share two neighbors. Again, we suggest discarding such graphs. 
In general, if one is choosing a relatively small graph, then there is a reasonable chance 
that it will have some very small set of vertices with low expansion. We were always 


able to screen these out by experiment. 


e The sequential decoding algorithm presented in Section 2.3.1 can be improved by the 
same means as many “slope-descent” algorithms. In experiments, we found that a 
good way to escape local minima was to induce some random errors. The sequential 
decoding algorithm also benefits from a little randomness in the choice of which vari- 
able it flips next. We also found that we could decode more errors if we allow the 


*Computation of the eigenvalues of the graph does not work because Kahale [Kah93a] has proved that 
the eigenvalues cannot certify expansion greater than d/2. 
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algorithm to make a limited amount of negative progress. This finding is evidenced 


in figure 2-4. 


e The parallel decoding algorithm presented in Section 2.3.3 seems better suited for 
implementation in hardware than the sequential algorithm. The performance of this 
parallel algorithm can be improved by changing the threshold at which it flips vari- 
ables. An easy improvement is to start the threshold high and decrease it only when 
necessary. The algorithm performs even better if one only flips the variables that have 
the maximum number of unsatisfied neighbors. Of course, many hardware implemen- 
tations will not allow this flexibility. We suggest that the implementer experiment to 


discover the arrangement of thresholds that performs best in their application. 


These codes really are very easy to implement. We implemented these codes in C after 
a few hours work and set about testing their performance. In each test, we chose a random 
bipartite graph of the correct size and tested the performance of various decoding algorithms 
against random errors. For our tests, we never performed encoding—it suffices to examine 
the performance of the decoding algorithms around the 0 word. We present some results of 
our experiments so that the reader will know what to expect from these codes. However, 
we suggest that researchers implement and try them out on the error-patterns that they 


expect to encounter. 


2.6.1 Some experiments 


In our experiments, we found that expander codes of the type discussed in Section 2.3 
performed best when derived from a randomly chosen graph with degree 5 at the variables. 
We varied the rates of our codes by changing the degrees of the graphs at the constraints. 
In Figure 2-3, we present the results of experiments performed on some expander codes 
of length 40,000. We began by choosing an appropriate graph at random as described in 
Section 2.1.1. We then implemented a variation of the sequential decoding algorithm from 
Section 2.3.1 in which we allowed the algorithm to flip a limited number of variables even if 
they appeared in only two unsatisfied constraints (we call these “negative progress flips” ). 


We then tested each code on many error patterns of various sizes. For each size, we choose 
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50,000 error patterns of this size uniformly at random. Points in the graph indicate the 
number of errors that a code corrected in all 50,000 tests. One can see that the expander 


codes corrected an amazingly large number of errors. 


Some expander codes of length 40,000 
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Figure 2-3: Number of errors expander codes could almost always correct. For example, 
the point in the upper left hand corner of the figure indicates that the code of rate 1/2 
corrected all of the 50,000 patterns of 1,720 errors on which it was tested. 


In Figure 2-4, we compare the performance of a version of the sequential decoding algo- 
rithm in which we allow negative progress flips with a version in which we do not. We chose 
a rate 1/2 expander code of length 40,000. For each number of errors, we performed 2, 000 
tests of each algorithm and counted the number of times that each successfully corrected 
the errors. While allowing negative progress flips did not seem to have much impact on the 
number of errors that the algorithms could “almost always” correct, it did greatly increase 


the chance that the algorithm could correct a given number of errors. 
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Figure 2-4: A rate 1/2 expander code of length 40,000. Comparison of the probability 
that the sequential decoding algorithm corrects a given number of randomly chosen errors 
when we do and do not allow negative progress flips (a negative progress flip occurs when 
a variable in more satisfied than unsatisfied constraints is flipped). 


CHAPTER 8 


Linear-time encodable and 
decodable error-correcting codes 


In this chapter, we build on the techniques developed in Chapter 2 to construct asymp- 
totically good families of error-correcting codes that have both linear-time encoding and 
decoding algorithms. We call these codes superconcentrator codes because their encoding 
circuits resemble Pippenger’s construction of superconcentrators [Pip77]. In Section 3.1, we 
explain why this resemblance is necessary. 

To aid in our construction of superconcentrator codes, we introduce the notion of an 
error-reducing code. An error-reducing code has the property that if a decoder receives a 
partially corrupted codeword, then it can correctly compute a large fraction of its message 
bits. The fraction of the message bits that the decoder cannot compute should be smaller 
than the fraction of the bits of the codeword that were corrupted in transmission. In 
Section 3.2, we construct error-reducing codes that have very fast linear-time encoding and 
error-reducing algorithms. One could imagine using such a code if one is transmitting data 
that is already encoded by an error-correcting code. In such a situation, it is not necessary 
to remove all the errors created in transmission, but it might be advantageous to quickly 
correct some of them. 

In Section 3.3, we demonstrate how to recursively combine error-reduction codes to build 


our error-correcting superconcentrator codes. The constructions in Sections 3.2 and 3.3 are 
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analogous to those in Section 2.3. The only way that we know of obtaining expander graphs 
of the quality needed for this construction is to chose the graphs at random. 

To build explicit constructions of superconcentrator codes in Section 3.4, we use a con- 
struction of error-reduction codes analogous to the construction of expander codes in Sec- 


tion 2.5. The constructions and proofs in Section 3.4 are analogous to those in Section 3.3. 


3.1 Motivating the construction 


It is not an accident that our linear-size encoding circuits look like superconcentrators. They 
have to. In this section, we will explain why. While this fact was the motivation behind 


our construction, one should be able to understand our construction without reading this 


section. 
Consider a circuit C that takes as input some bits 2,,...,2,,, and produces as output 
bits p,,...,P, such that the words 71,...,%m,P1,---,Pn form a good error-correcting code. 


This means that there is a constant 6 such that even if we erase’ any 6m of the input 
bits and any én of the output bits, we can still recover the erased 6m input bits. We will 
show that this means that there must be gate-disjoint paths from the erased inputs to some 
subset of the un-erased outputs. 

Assume that we cannot find 6m vertex-disjoint paths from the erased inputs to the un- 
erased outputs. Then, Menger’s Theorem implies that there is some set of 6m — 1 gates in 
the circuit such that all paths in the circuit from the erased inputs to the un-erased outputs 
must go through these gates. This contradicts our assumption that it is possible to recover 
the values of the erased inputs because there are 6m bits of information in the erased input 
gates, but only ém — 1 bits of information can get through to the un-erased output gates. 

Thus, we see that vertex disjoint paths can be drawn in the underlying graph from 
any ém input gates into any (1 — é)n output gates. While this property is not quite 


as strong as the property required of superconcentrators, it is sufficiently close that we 


'Usually, we discuss errors in which a bit’s value is flipped. It is also possible to consider the case in 
which no value is received for some bit. In this case, the decoder knows that the value of the bit has been 
lost. These errors are called erasure errors. Any error-correcting code that can tolerate the first type of 
error can also tolerate erasure errors. If one is uncomfortable with erasure errors, then just flip a coin and 
assign its value to each erased bit. One will probably be left with half as many errors of the first type. 
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decided that the easiest way to create linear-size encoding circuits would be to base them 


on Pippenger’s [Pip77] construction of linear-size superconcentrators. 


3.2 Error-reducing codes 


In this section, we introduce the concept of an error-reducing code. While we are not sure 
if this idea will have practical applications, it will be useful for understanding our main 


construction. 


We will define an error-reducing code of rate r < 1, error-reduction € < 1, and reducible 


distance 6 < 1 to be the image of a function 


f : {0,17 — {0,13" 


such that there exists a function 


g : {0,1}" — {0,1}" 


such that for all z € {0,1}’" and z € {0,1}” 


d( f(x), z) < én > d(x, g(z)) < ed( f(«), z). 


f is the encoding function, and g is the decoding (error-reduction) function. 

To construct an error-reducing code, we will modify the construction of expander codes 
from Chapter 2. Let B be an unbalanced bipartite expander graph with n nodes of degree 
d on one side and n/2 nodes of degree 2d on the other. We will turn this graph into a 
circuit by directing all the edges from the large side of the graph to the small side, letting 
the nodes on the large side be the input gates, and letting the nodes on the small side be 
parity gates” (see Figure 3-1). These parity gates are the outputs of the circuit. 
Warning: The pictures that appear in this chapter are very similar to those that appeared in 


Chapter 2, but their meaning is different! In Chapter 2, the nodes on the right-hand side of 


2A parity gate computes the sum modulo 2 of its inputs. 
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Figure 3-1: A circuit that encodes an error-reducing code. 


the bipartite graphs represented restrictions of the nodes on the left-hand side. In this chapter, 
the nodes on the right-hand size of the bipartite graphs have values that are determined by the 
nodes on the left-hand side. 

We now use this circuit to define a code R,4 of rate 2/3 by placing the message bits 
at the inputs to the circuit and using the outputs of the parity gates as the check bits. 
The code we thereby obtain is a horrible error-correcting code: it has a word of weight 


d+ 1 (see Figure 3-2). However, if the graph is a good expander graph, then this code is 


— 


a 


Figure 3-2: A low-weight codeword: only one input bit is 1, and only those parity gates 
that read this bit are 1; all others are 0. 


a good error-reducing code. It is obvious that we can encode this code in linear time (we 
need merely compute the values of the parity gates, each of which has a constant number 
of inputs). We will now show that it is possible to perform error-reduction on this code in 


linear time. 
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The decoder naturally associates each bit of the word that it receives with one of the 
gates in the encoding circuit. We will say that a parity gate in the circuit is satisfied by a 
word if the bit associated with the gate is the parity of the bits associated with its inputs. 


Otherwise, we will say that it is unsatisfied (see Figure 3-3). The decoder will successively 
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Figure 3-3: (a) is a satisfied parity gate. (b) is unsatisfied. 


flip the bits associated with the input gates in an effort to decrease the number of unsatisfied 


parity gates. 


Sequential error-reduction algorithm: 


e If there is an input that is an input to more unsatisfied than satisfied parity gates, 


then flip the value of that input. 
e Repeat until no such inputs remain. 


This algorithm is essentially the same as the sequential expander-code decoding algorithm 
of Section 2.3, so it is easy to implement it so that it takes constant time for each iteration. 
At each iteration, it decreases the total number of unsatisfied parity gates, so it can run for 


at most a linear number of iterations. 


Lemma 3.2.1 Let R,,4 be derived from a degree (d,2d) bipartite graph between a set of 
n inputs and n/2 parity gates such that all sets of at most an inputs expand by a factor of 
at least ($d +1). Assume that the sequential error-reduction algorithm is given a word that 
resembles a codeword of R,, 4 except that at most an/2 of the inputs have been corrupted and 
at most an/2 of the gates have been corrupted. Then, after the termination of the sequential 


error-reduction algorithm, at most an/4 of the inputs will be corrupted. 
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Proof: We will let V denote the corrupted inputs, v the size of V, u the number of 
unsatisfied parity gates with inputs in V, and s the number of satisfied parity gates with 
inputs in V. We will view the pair (u,v) as the state of the algorithm. We will first show 
that if an/4 < v < an, then there is some input in more unsatisfied than satisfied parity 


gates. The expansion of the graph implies that 
3 
uts > (Fa+1) v. (3.1) 


Each gate with an input in V accounts for at least one wire leaving V. It is possible 
that as many as an/2 of the satisfied parity gates with inputs in V are satisfied because 
they have only one wire from V, but they have been corrupted. The rest must have two 


wires from V. By counting the dv wires leaving V, we obtain 
dv >u+s+(s—an/2) > dv+an/2>u4+2s (3.2) 


Combining equations (3.1) and (3.2), we find 


s < (Ja- 1) vtan/2, and 
1 
u > (54 + 2) v—an/2. (3.3) 


When an > v > an/4, we have u > dv/2, so there must be some input in more unsatisfied 
than satisfied parity gates. 

To show that the algorithm must terminate with v < an/4, we show that v must always 
be less than an. We assume that when the algorithm begins v < an/2 and therefore 
u <dan/2+ an/2. As the algorithm proceeds, u must steadily decrease. However, if the 
algorithm is ever in a state (u,v) in which v = an, then equation (3.3) would imply that 
u > dan/2+ 3an/2, which would be a contradiction. 

Thus, the algorithm must always maintain the condition that v < an. This implies that 


the algorithm cannot terminate unless it is in a state in which v < an/4. a 


We have proved that R,,.q is an error-reducing code of rate 2/3 and error-reduction 1/2. 
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It is also possible to perform error-reduction on this code in parallel. 


Parallel error-reduction algorithm 


e For each input, count the number of unsatisfied parity gates to which it is an input. 


e For each input that is an input to more unsatisfied than satisfied parity gates, flip the 


value of that input. 


It is easy to implement this algorithm as a constant-depth circuit of linear size. 


Lemma 3.2.2 Let R,,.q be derived from a degree (d,2d) bipartite graph between a set of n 
inputs and n/2 parity gates such that all sets of at most an inputs expand by a factor of at 
least ($ + €)d, for any « > 2/d. Assume that the parallel error-reduction algorithm is given a 
word that resembles a codeword of R,, 4 except that at most v < an/2 of the inputs have been 
corrupted, and at most b < an/2 of the gates have been corrupted. Then, after the execution 
of the parallel error-reduction algorithm, at most 


v+4(6+4+ 8c) 


A 
14+ 4e (3 ) 


of the inputs will be corrupted. 


Proof: We will let V denote the set of corrupted inputs, F’ the set of corrupted inputs 
that fail to flip, and C the set of inputs that were originally clean, but which become 
corrupted by the parallel error-reduction algorithm. We will let N(V), the neighbors of V, 
denote the set of parity gates that contain inputs in V. We will let » = |V|, dv = |F\, 
qv = |C|, and édv = |N(V)I. 

We begin by obtaining a bound on ¢ in terms of 6. Every input in F’ is an input to at 


least as many satisfied as unsatisfied parity gates. At most b of these satisfied parity gates 


dou 


5 — 6 of the wires leaving 


are satisfied because they have been corrupted. Thus, at least 


F end in a parity gate that contains a wire from another element of V. Thus, the set V 


can have at most 
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neighbors. Because we have set the number of neighbors of V to be édv, we obtain 


dv < dv oer? 
4°29 
dud b 
OP s (1-8 dv+~ 
7S Ua bdo 5 
2b 
< 41-642 

6 < 41-42 (3.5) 


Next, we will bound y in terms of é6 by showing that 


6-(G+9+a 
y< du 


(3.6) 


Assume by way of contradiction that this is false, and let C’ be a subset of C’ of size exactly 
b-(Gt9) 4a 


Tz . Each input in C’ is an input to at least d/2 unsatisfied parity gates. At most } 


of these gates can be unsatisfied because their parity gate has been corrupted. The others 
must be unsatisfied because they have an input in V. Thus, the number of parity gates 


containing inputs in the set V UC" is at most 
d i 
dév + 3/¢ | + b. 


We will set ¢ > 2/d, which will imply that |C’ UV] < an, so this set must expand by a 


factor of at least (2 + €). Thus, we obtain 
3 / d / 
(F+6) |JC"U V| < dév + sIC'| +5, 


which contradicts our assumption about the size of C”. 
By combining equations (3.5) and (3.6), we find that the number of corrupted inputs 


after an execution of the parallel error-reduction algorithm is at most 


6-($+94+2 2b 

4 € d 

_ og +3e~ 466) + 5+ G49 F 
= Tye 


v + 2(6 + 8€) 
14+ 4e . 


3.3 Error-correcting codes 67 


message _ parity ( 
bits bits —+— AD 
eae " (D 
4 Nee \ WD 
ae ay : 
Dd 
> > WD 
Kg) a 
e) 
~~ Cr S 
ey LT 
fe) fe) 
5 5 D 
D 
O Ox, as 
A / 
Good Code ana) 


Figure 3-4: A good code embedded in an error-reduction code. 


We can iterate this algorithm a constant number of times in order to further reduce the 


number of corrupted input bits. 


3.3. Error-correcting codes 


The main idea in our construction of linear-time encodable and decodable error-correcting 
codes is to use the linear-time encodable and decodable error-reducing codes recursively. 
Imagine what happens when we take a good error-correcting code of length n and use the 
words of this code as the message bits of an error-reducing code (See Figure 3-4). We now 
have an error-correcting code of greater length with a higher error-tolerance; however, it has 
the same number of message bits as the original code. To increase the number of message 
bits in the code, we will create a new set of more message bits, and encode these message 
bits in an error-reduction code. We then use the check bits of this error-reduction code as 
the message bits of the error-correcting code we just constructed (See Figure 3-5). 

When we apply this construction recursively, we find that not only have we defined 
the error-correcting code, but we have created a linear-size circuit that encodes the error- 
correcting code. Moreover, the output of each gate of this circuit is a check bit of the 
error-correcting code. This will be true with one small exception: we should choose a 


better base case for our recursion. The performance of our code will be dictated by the 
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Figure 3-5: The recursive construction of C;,. (a) is the error-reducing code on the new 


message bits. (b) is the error-reducing code placed on top of the error-correcting code. 


quality of the code that we choose as our base. Thus, we may want to choose a particularly 
good code of small block length. Another constraint is that we don’t know that we will be 
able to find good expander graphs of small size. However, there definitely is some constant 
size after which we will be able to find good expander graphs, and which we will choose to 
be the block length of our base code. 

We will now present a formal description of one family of superconcentrator codes. We 


provide this description by describing the encoding circuits for these codes. 
Description of superconcentrator codes: 


e Choose absolute constants 6 and d. 
e Choose a code C, of length 4-2°, rate 1/4, and as large minimum distance as possible. 


e Let C, be a circuit that takes 2° bits as input and produces 3 - 2° bits as output so that 
these bits taken together form a codeword of C,. (The 2° inputs bits are the message bits 
of the code, and the others are the check bits.) 


e Fork > 6, let R, and Rj, be circuits that encode an error-reduction code of rate 2/3 with 
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2" message bits and 2*/2 check bits, as described in Section 3.2. 


e To form circuit Cy, from C,_1, take a copy of Ry and use the inputs of R, as the inputs 
of C,. Add C,_1 to the circuit by identifying the output gates of R, with the inputs of 
Cy_1. Finally, attach a copy of R,,, by identifying all the input and output gates of the 
copy of C,_; with the inputs of R,,,. The output gates of C, will be all the input and 
output gates of C,,_; along with the output gates of Ri,,, (See Figure 3-5). 


e Let C, be the rate 1/4 code obtained by taking 2* message bits, feeding them into Cy, 


and using the 3-2* output bits as the parity checks of the code. 
We can decode these codes in linear sequential time. 


Sequential superconcentrator code decoding algorithm: 


e If k = 6, then decode Cy using an arbitrary decoding algorithm. 


e If k > 6, then apply the sequential error-reduction algorithm to the nodes in the R,,,. 
Now, recursively decode the nodes in the copy of C,_, using the sequential decoding 
algorithm for C,_,. Finish by applying the sequential error-reduction algorithm to the 


copy of Ry. 


Theorem 3.3.1 If the superconcentrator code C; is constructed from degree (d, 2d) graphs 
such that in each graph, every at most a fraction of inputs expands by a factor of at least 
(3d + 1), and if C, is chosen to be a code in which any a/2 fraction of errors can be corrected, 
then the sequential superconcentrator code decoding algorithm will correct up to an @/8 fraction 


of errors and will run in linear time. 


Proof: We assume that there are at most a2* /2 errors in the nodes of C;,. By Lemma 3.2.1, 
after we apply the sequential error-reduction algorithm R,,,, there will be at most ak /4 
errors in the nodes of the copy of C,_;. We can now assume by induction that the de- 
coding algorithm for C),_, will correct all the errors in its input and output nodes. As the 
input nodes of C,_; are now the check bits of the error-reduction code R, corresponding 
to the message originally contained at the inputs of the copy of R,, and there are at most 


a2* /2 errors in these message bits, we can use the sequential error-reduction algorithm to 
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correct all the errors in the input nodes of R, (This can be easily observed from the proof 
of Lemma 3.2.1, or from the analysis of the sequential expander code decoding algorithm 
in Theorem 2.3.1). Since the inputs of R, are the inputs of C,, we have removed all the 
errors from the message bits of the code. 

To see that this algorithm runs in linear time, observe that each error-reduction step 
runs in time that is linear in the number of bits that it is acting on, and each step acts on 


half as many bits as the previous step did. | 


It is easy to see that there is a constant a that satisfies the requirements of Theorem 3.3.1, 
but we will not attempt to optimize the constant here. We will note that the main constraint 
on the constant is in the analysis of the quality of expansion obtained by a randomly chosen 


eraph. 


Remark 3.3.2 We have only constructed codes of lengths 2" where & is an integer. It is easy 


to use similar techniques to construct codes of other lengths. 


We need to be slightly trickier to decode the superconcentrator codes in parallel loga- 
rithmic time. The problem that we must overcome is that if we iterate the parallel error- 
reduction algorithm enough times to remove all the errors from the input bits of C;, we 
will need to go through O(%) iterations. If we did this for each C;, then we would have an 
O(log?n) time algorithm. To overcome this problem, we will perform the error-reductions 
from the input bits of C;_, to the input bits of C; simultaneously for all 7. Thus, while the 
input gates of C;_, are being used to reduce the errors in the input gates of C;, the input 
gates in C; are being used to reduce the errors in the input gates of Cj41. 

In order to show that the reduction of errors of the bits of C; using the output bits of 
Ri,, works, we will assume ¢ = 4+, for ¢ > 0 and d > 16. We now wish to observe that 
if 6 < an/2, and an/4 < v < an/2, then after one round of the parallel error-reduction 
algorithm, v will decrease by a constant multiplicative factor. From equation (3.4), we can 
see that this constant factor will be bounded by the decrease that occurs when v = an/4 
and 6 = an/2. By plugging these values into equation (3.4), we can see that this constant 
is less than 1. We now know that if v < an/2 and 6 < an/2, then after a constant number 


of rounds, we will have v < an/4. Let c be this constant. 
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To prove the correctness of the error-reductions on the input nodes of the C;’s, we will 
assume that there is a w < an/2 such that »v < w and b < w/2. By substituting into 


equation (3.4), we find that after one decoding round the number of corrupted inputs is 


bounded by 
Le (ew) y 


Let y = (1- =a). 


We can now state the parallel superconcentrator code decoding algorithm. 


Parallel superconcentrator code decoding algorithm: 


e For: = & —1 to &: Apply c rounds of the parallel error-reduction algorithm using the 
inputs and outputs of C; as the message bits, and the outputs of R,,, to which they are 
attached as the check bits. 


e Decode the errors in Cy using any decoding algorithm. 


e For log, ;, 2* rounds: Apply the parallel error-reduction algorithm to the copy of R; 


between the inputs of C; and the inputs nodes of C;,,, simultaneously for all b< i < k-1. 


Theorem 3.3.3 If the superconcentrator code C; is constructed from degree (d, 2d) graphs 
such that in each graph, every at most a fraction of inputs expands by a factor of at least 
(3 + 4 + €')d, for some ¢’ > 0 and d > 16, and if C, is chosen to be a code in which any a/2 
fraction of errors can be corrected, then the parallel superconcentrator code decoding algorithm 


will correct up to an a/8 fraction of errors in logarithmic time with a linear number of processors. 


Proof: We begin by assuming that there are at most a2*/2 errors in the bits of Cg. 
After we apply c rounds of the parallel error-reduction algorithm to R,41, there will be at 
most a2*/4 errors in the bits of the copy of Cy_,. Similarly, after we have finished the 7-th 
stage of the algorithm, there will be at most a2*/2't! errors in the bits of C,_;. Thus, 
C, will have fewer than a2°/2 errors, so the decoding algorithm for C, will correct all the 


errors in Cp. 
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We can now move on to the decoding of the input bits of the C;’s. We have already 
observed that the input bits of C; have at most a2‘ /2 errors and that the input bits of Cy are 
free of error. Thus, after we apply one round of the error-reduction algorithm simultaneously 
to all of the R,’s, the input bits of C; will have at most ya2'/2 errors. Similarly, after we 
apply the error-reduction algorithm for log, ;, 2* rounds, there will be no more errors in any 


of the input nodes of Cy. | 


3.4 Explicit Constructions 


The relation between our first constructions of superconcentrator codes and our explicit 
constructions is analogous to the relation between our constructions of expander codes in 
Sections 2.3 and 2.5. We will begin by generalizing our construction of error-reducing codes. 

If we view the inputs to one of the parity gates in Section 3.2 as the message bits of 
a code, then we can view the parity gate as computing a check bit associated with those 
message bits. This bit is chosen so that when it is considered with the input bits, these 
bits form a word in the code of even weight words. To generalize this construction, we will 
associate a collection of parity gates with each node on the small side of the bipartite graph, 
and we will connect them to their inputs so that these gates compute the check bits of a 
codeword in some more complex error-correcting code. We call such a collection of parity 
gates a cluster. 

As in Section 2.5, we will make use of a family of good expander graphs, such as those 
described in Section 2.4. 


We now define the circuits R(G',C) that will encode our explicit error-reduction codes. 


Explicit error-reduction codes: 


e Let G be a d-regular graph on n vertices, and let C be a linear error-correcting code 


of block-length / with d message bits. Let B be the edge-vertex incidence graph of G. 
e The circuit R(G,C) will have dn/2 input gates and (/ — d)n parity gates. 


e Each node on the large side of B will correspond to an input of Rk(G,C). The parity 


gates are arranged into clusters of size | — d, and each cluster is identified with one of 
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the nodes on the small side of B. The input gates that are neighbors of a cluster will 


be called the inputs of the cluster. 


e The parity gates are connected to the input gates so that for each cluster, if the inputs 
of that cluster are the message bits of a codeword of C, then the parity gates in the 


cluster compute the check bits of that codeword. 


e Let R(G,C) denote the code obtained by using the inputs of R(G,C) as message bits 


and the outputs of the circuit as check bits. 


To prove that R(G,C) is a good error-reduction code if G is a good expander graph, we 
will use Lemma 2.4.6. 
To perform error-reduction on R(G,C), we will associate each bit of a received word 


with an input or gate of R(G',C), as we did in Section 3.2. 


Parallel explicit error-reduction algorithm: 


e In parallel, for each cluster, if the bits associated with the inputs and gates of a cluster 
are within — of a codeword of C, then send a “flip” message to every input that needs 


to be flipped to obtain that codeword. 


e In parallel, every input that receives the message “flip”, flips its value. 


We can now prove a lemma for our explicit construction that is analogous to Lemma 3.2.2. 


Lemma 3.4.1 Let {Gz} be a family of good expander graphs. There exist constants d and g 
such that if the parallel explicit error-reduction algorithm is given as input a word that resembles 


a codeword of R(G,,4,C) except that for some w < end 


e at most > < v < 2w inputs are corrupted and at most b < w parity gates are corrupted, 
then after the execution of the algorithm, the number of corrupted inputs will decrease by a 
constant multiplicative factor, and 


e if at most v < 3 of the inputs are corrupted and at most b < w of the parity gates are 


corrupted, then after the execution of the algorithm, at most + inputs will be corrupted. 
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Proof: Let V be the set of v corrupted inputs. Set a and { so that v = an and 
b= Ans. 

We will say that a cluster is confused if it sends a “flip” message to an input that is 
not corrupt. In order for a cluster to be confused, it must have at least 2] corrupt inputs 
and gates. Thus, there can be at most 


n£(2a + 3) 


Fed 
confused clusters. Each of these can send at most zed “flip” messages, so at most 


nf(2a+ 8) 1 n£(2a + 3) 
+ ced = 
zed 6 5 


uncorrupted inputs can receive “flip” signals. 
We will call a cluster unhelpful if it has a node of V among its inputs, but it fails to 


send a “flip” signal to that node. There can be at most 


n£(2a + 3) 


T 
ged 


unhelpful clusters. By Lemma 2.4.6, there can be at most 


d (So+8) (S243) Na 
n= ((——" } 4 (—— |) 
2 € € d 
inputs both of whose neighbors are unhelpful clusters. 


The total number of corrupted inputs after the algorithm is run will be at most 


of (BR (OHH) (BY) mW) 


e 


Set w= 7 ng. We want to show that 


dah, (Sa438)* | (Sat38 
€ 


5 € a 


for B € [0, <] anda é [<, 2), Because the function in consideration is decreasing in 3, and 
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quadratic in a with positive coefficients, it suffices to consider the function at the points 


2 2 2 2 
ae {s 2c and G = =. For a = =, we need 


2g? g 2g 
€ (“) 6e Ag 
10g g g da’ 


2 
and for a = ~, we need 


e (By 1l5e Aq 
—>{(—) +—-—. 
g g g ad 


We now see that, given €, we can choose g and then d so that both of these inequalities 
hold. For lower values of w, the claim follows from a similar analysis. 
To prove the second claim, it suffices to observe that the function in question is decreas- 


ing in a for a > 0. a 


We will find it convenient to let C be a code of length 5d/4 and rate 4/5. By Theo- 


rem 1.4.1, there exists a code with length 2d, rate 4, and relative minimum distance € for 


30 
H(e) < 1/5. 

To obtain an explicit construction of superconcentrator codes, we will need expander 
eraphs of very particular sizes. Fortunately, we demonstrated in Section 2.4.1 how to con- 
struct good expanders of every sufficiently large number of vertices. By Proposition 2.4.19 
and Remark 2.4.20, we can construct bipartite graphs between any number of degree-2 
vertices and a set of vertices in which each vertex has degree d — 1 or d and such that the 
eraphs have expansion properties like those graphs used in Lemma 3.4.1. The effect of the 
differing degrees of the vertices on the small side of the graph can be made negligible by 
slightly altering some of the codes associated with some of these vertices. Thus, we can 
assume that we can obtain good error-reduction codes of any sufficiently large size, and 


that they perform as described in Lemma 3.4.1. 


Explicit superconcentrator codes: 


e Choose a family of good expander graphs, G = {G',,4} such that the subfamily for each d 


contains a dense subsequence. 


e Choose a constant d and a code Cy of rate 4/5, length 5d/4 and minimum relative distance 
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€ where €« > H(1/5). (As described in the previous paragraph, we will assume that G 
contains a graph of every sufficiently large number of vertices in which every vertex has 


degree d or d— 1.) 


e Choose a constant 6 and a code C, of length 4-2°, rate 1/4, and as large minimum distance 
as possible. Let Cy be a circuit that takes 2° bits as input and produces 3 - 2° bits as 
output so that these bits taken together form a codeword of C,. (The 2° inputs bits are 


the message bits of the code, and the others are the check bits.) 


e For k > 6, let R, and Rj, be the circuits R(G2«4,Cq) defined earlier in this section. We 
have chosen the code Cz so that this circuit has 2° input gates and 2*/2 parity gates (If 
the graph isn’t d-regular, then we might have to modify the codes at the clusters slightly 


to make sure that we have 2/2 parity gates, but this effect is insignificant). 


e To form circuit Cy, from C,_1, take a copy of Ry and use the inputs of R, as the inputs 
of C,. Add C,_1 to the circuit by identifying the output gates of R, with the inputs of 
C,-1. Finally, attach a copy of Rj,,, by identifying all the input and output gates of the 
copy of C,_; with the inputs of R,,,. The output gates of C, will be all the input and 


output gates of C,_; along with the output gates of Ry, ). 


e Let C;, be the code obtained by taking 2* message bits, feeding them into C,, and using 
the 3-2* output bits as the parity bits of the code. 


Lemma 3.4.1 implies that there is a constant c such that if vo < w and 6 < w, where 
w < 2%e?/g, then after c executions of the parallel explicit error-reduction algorithm, we 
will have at most w/2 corrupted inputs. Similarly, there exits a constant y < 1 such that if 
v < 2w and b < w, then after the execution of the error-reduction algorithm there will be 


at most yv corrupted inputs. 
Parallel explicit error-correction algorithm: 


e Identical to the algorithm presented in Section 3.3, but using the definitions of C;, 


and R, presented in this section. 


Theorem 3.4.2 There exist settings of the parameters of the constructions of the rate 1/4 


superconcentrator codes C; presented in this section, as well as settings of € and g, such that the 
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parallel explicit superconcentrator code decoding algorithm will correct up to an ¢*/4g fraction 
of errors in logarithmic time with a linear number of processors. Moreover, this algorithm can 


be simulated in linear sequential time. 


Proof: We use Theorem 2.4.7, Theorem 2.4.15, Proposition 2.4.19, and Remark 2.4.20 
to show that there exist expander graphs in the appropriate sizes to construct the 2/3-rate 
error-reduction codes R,. The similarity of Proposition 2.4.19 to Lemma 2.4.6 implies that 
these error-reduction codes will perform as described in Lemma 3.4.1. The rest of the proof 
of the first claim is identical to the proof of Theorem 3.3.3. 

By keeping track of which clusters of parity gates are “unsatisfied”, it is fairly simple 
to simulate this algorithm in linear time on a sequential machine (assuming that pointer 


references have unit cost). The idea is similar to that used in Theorem 2.5.2. a 


Remark 3.4.3 One can vary this construction to obtain asymptotically good linear-time en- 
codable and decodable codes of any constant rate. To achieve rates exceeding 1/4, one need 
merely vary the ratios of the number of gates between successive levels of the construction and 


possibly omit the last few error-reduction codes. 


Remark 3.4.4 We can replace the assumption that pointer references have unit cost with the 
assumption that retrieving O(log) bits stored at a pointer has O(log n) cost. We do this by 
concatenating the superconcentrator codes with some good code of length O(log). The inner 
code can be decoded by O(n/ log n) table look-ups (or it can be decoded recursively) and the 
outer code now contains O(n/ log) symbols of length O(log n). The outer code can now be 
decoded in linear time using O(n/ log n) pointer references if we use a natural generalization of 


the superconcentrator codes to alphabets of size O(log 7). 


3.5 Some thoughts on implementation and future work 


Superconcentrator and expander codes will probably only be useful in coding schemes using 
long block lengths; for small block lengths, many better special codes are known. The trade- 
off of rate versus error tolerance of the superconcentrator codes that we have constructed is 


not nearly as good as the tradeoff obtained by the expander codes constructed in Chapter 2. 
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In our view, we have added a lot of extra redundant information in order to make the codes 
encodable and decodable in linear time. Thus, we do not feel that it would be appropriate 
to use these codes on a channel in which bandwidth is at a premium. However, if one has a 
fast channel with bandwidth to spare, then it might be reasonable to use the superconcen- 
trator codes that we have constructed here. In particular, if one has a fast channel, then 
the computational work needed to encode and decode the error-correcting codes used can 
be a greater bottleneck in communication than the redundancy of the code. For an analysis 
of potential applications of long block length error-correcting codes, we direct the reader 
to [Bie92]. 

Those who desire to implement these codes should consider the issues discussed in 
Section 2.6. They should also be aware that our construction was optimized for ease of 
explication and that there are many parameters that one can vary which we have not 
discussed. For example, it might be advantageous to use error-reduction codes of rates 
other than 2/3 in the recursive construction. 

However, the reason that we did not attempt to optimize the tradeoff of rate versus error 
tolerance in our construction is that we believe that a new idea will be required to construct 
codes of rate and error tolerance competitive with the Gilbert-Varshamov bound. That is 
our goal: to obtain linear-time encodable and decodable error-correcting codes with rate 
and error tolerance as good as the best known codes, whatever their algorithmic efficiency. 
Our feeling is that this goal is achievable and that the work in this chapter constitutes an 
important step towards it, but that it cannot be achieved by simple modifications of our 


current construction. 


CHAPTER 4 


Holographic Proofs 


A holographic proof system is a system of writing and checking proofs in which one can 
probabilistically check the validity of a proof by examining only a few of its bits. If someone 
tells you that they have written a holographic proof of a theorem, then there is a simple 
procedure by which you can randomly choose a few bits of the proof to examine, perform 
a simple computation on these bits, and decide whether to accept or reject the proof. You 
will always accept a valid holographic proof. Conversely, if the probability that you accept 
is greater than 1/2, then the proof that you examined must be very close to a holographic 
proof of the theorem; thus, a proof of the theorem does exist. You can repeat this test 
a few times to obtain an arbitrarily high degree of confidence that the theorem is true. 
Remarkably, any theorem that has a conventional proof has a holographic proof. 

Constructions of holographic proofs', which are also known as transparent proofs and 
probabilistically checkable proofs, were developed in a series of papers [BFL91, BFLS91, 
FGL*91, AS92b, ALM*92] which culminated in the following theorem: 


Theorem 4.0.1 [PCP-Theorem] Let F be an asymptotically good error-correcting code of 


1Some authors use the terms holographicand transparent to describe proof systems in which the theorem to 
be proved is presented in an encoded format. This encoding is necessary if the proof checker is to run in poly- 
logarithmic time or read only a constant number of bits of the theorem. They use the term probabilistically 
checkable to describe systems in which the proof checker is allowed to read the entire statement of the theorem 
and run in polynomial time. The purpose of the encoded theorems is discussed further in Section 4.4. 
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minimum relative distance 6 such that there is a polynomial-time algorithm that will determine 
whether a given word is a codeword of /’. Then, for any constant &, there exists a probabilistic 


algorithm V that expects / + 2 inputs, (Yo, ¥1,..-, Ys, IL), and runs in time log*) n such that 
e V only reads a constant number of bits from each of its & + 2 inputs; 


e if Yo = E(C) for some circuit C’ that expects & inputs of lengths n,,...,n, and if 
Y; = E(X;) where |X;| = n;, for each 7, and C' accepts on input (X1,...,X;), then 
there is a string II such that V accepts on these inputs with probability 1; moreover, this 


string II can be computed from C and X,,..., X, in time |II|log% |II 


e if the probability that V accepts on inputs (Yo,..., Y,, IL) is greater than 1/2, then there 
exists a circuit C’ that accepts k inputs of lengths (71,...,,) and strings X,,..., X, of 


the corresponding lengths such that C' accepts on input (X1,...,X,) and d(£(C), Yo) < 6/3 
and d(E(.X;), Y;) < 6/3 for each 2; 


e V only uses O(log |C|) random bits; and, II has size |C|@). 


We will explore why this theorem takes the form that it does later in this chapter. To 
see how it implies a statement similar to the informal statement given at the start of this 


section, we draw the following corollary: 


Corollary 4.0.2 There exists a probabilistic polynomial-time turing machine V that accepts 
two inputs, a description of a circuit C’ and a witness II. V reads only a constant number of 
bits of its input II and uses only O(log |C'|) random bits. If there is an assignment that satisfies 
C’, then there is an input II that will cause V to accept with probability 1. Conversely, if there 
is an assignment of II that causes V to accept with probability greater than 1/2, then C' has a 


satisfying assignment. 


The witness II in this corollary is a holographic proof that C has a satisfying assignment. 
To derive our assertion about proofs of theorems, we observe that the problem of deciding 
whether any statement has a proof of a given length can be converted into a problem 
of deciding whether a certain circuit has a satisfying assignment. Using the techniques 
of [BFLS91], one can show that the size of this circuit will be greater than the size of the 


statement of the theorem and its proof by at most a poly-logarithmic factor. 
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The main contribution of this chapter, Theorem 4.6.1, is a strengthening of Theo- 
rem 4.0.1 in which the holographic proof, II, has size O(|C|'t*), for any « > 0. The 
size of the proof will depend on the number of queries that the proof checker makes. If 
we allow the proof checker to make as many as O(log|C']) queries, then we will be able to 


construct proofs of size |C|(log |C']) A088 !ED, 


4.0.1 Outline of Chapter 


Our construction of holographic proofs follows the general plan used in [BFLS91]. The 
authors of that paper explained how to construct holographic proofs of size O(|C|'t*), for 
any € > 0, in which the proof checker reads O(log |C|) bits of the proof. In order to construct 
such small proofs, they first had to develop an efficient way to translate any proof into a 
simple, well-structured format. In Section 4.3.1, we present a simpler but slightly weaker 
means of achieving this translation. The framework presented in this section should be 
sufficient for most applications. In Section 4.6, we explain how our techniques can be fully 
integrated with the framework of [BFLS91]. 

Our holographic proofs are built from special polynomial error-correcting codes that 
have efficient checkers and verifiers. In Section 4.1, we define checkers and verifiers, and 
develop the terminology we will use to discuss them. 

Section 4.2 is devoted to defining the polynomial codes that we use and to demonstrating 
that they have the checkers and verifiers that we desire. The main statement of this section, 
Theorem 4.2.19, is an improved analysis of the “low-degree testing” used in Arora and 
Safra [AS92b]—it shows that the testing works over a domain whose size is linear in the 
degree being tested. We present the material in this section using the language we developed 
in Section 4.1. 

In Section 4.3, we combine the analysis from Section 4.2 with ideas from [BFLS91], 
[AS92b], and [Sud92] to develop relatively simple holographic proofs that can be checked 
by examining a constant number of segments of the proofs, each of size VJnlog?? n. The 
proofs in this system can be used to provide proofs of many types of statments, including 
circuit satisfiability. This section has three parts. In the first, we present the NP-complete 


problem upon which we base our proof system. In Sections 4.3.2 and 4.3.3, we provide 
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an algebraic description of the problem. The key to this description is a novel algebraic 
description of a de Bruijn graph, Lemma 4.3.5. This algebraic description is combined with 
the techniques of Section 4.2 to construct holographic proofs in Section 4.3.4. 

In Section 4.4, we recursively apply the holographic proof system of Section 4.3 to itself 
to obtain the first variation of our construction of nearly linear size holographic proofs, 
Theorem 4.4.1. The recursion that we use differs from those in [AS92b] and [ALM*92] in 
that it does not need consistency checking. We do not show that the proofs in this first 
variation have checkers that run in poly-logarithmic time. In Theorem 4.5.1, we show that 
there is an easily computed table of information such that, if the proof checker has access 
to this table, then it can run in poly-logarithmic time. Essentially, this table captures 
computations that the proof checker would perform regardless of its input. That is the 
second variation of our construction. In Section 4.6, we use a technique from [BFLS91] to 
construct checkers that do not need such a table. This statement, Theorem 4.6.1, is the 
final variation of our main theorem. We conclude by stating a weaker but simpler corollary 
of this theorem. 

Since the history of the development of holographic proofs is rather involved, we do not 


explain it here, but refer the reader to one of 


e Sanjeev Arora. Probabilistic Checking of Proofs and Hardness of Approximation Prob- 
lems. PhD thesis, U.C. Berkeley, Aug. 1994. 


e Laszlé Babai. Transparent proofs and limits to approximation. In First European 
Congress of Mathematics: (Paris, July 6-10, 1992), volume 2, pages 31-92. Birkhauser, 
1994. 


e Oded Goldreich. A taxonomy of proof systems. SIGACT News, 24(4)—25(1), De- 
cember 1993—March 1994. 


e Madhu Sudan. Efficient checking of polynomials and proofs and the hardness of ap- 
proximation problems. PhD thesis, U.C. Berkeley, Oct. 1992. 
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4.1 Checkable and verifiable codes 


In this section, we define checkable and verifiable codes. Such codes have played an im- 
portant role in the construction of holographic proofs and were used implicitly in the 
work of [LF KN90] and [BFL91]. However, we first saw them explicitly defined in [Bab94] 
and [BF'93]. 


Definition 4.1.1 A checker for a family of codes {C;} of lengths {n;} and relative minimum 


distance 6 over an alphabet © is a probabilistic algorithm such that 


e The checker accepts each word of C; with probability 1. 


e If the probability that the checker accepts the word w of length n; is greater than 1/2, 


then there is a unique codeword of C; of relative minimum distance at most 6/3 from w. 


We will usually measure the performance of a checker by the number of bits of its input 
that it reads. Occasionally, we consider the computational complexity of the checker. 

One could imagine using a checker in a communication system in which one has the 
option of requesting the retransmission of a message. A checker would be able to read 
only a constant number of bits of a received signal and then estimate the chance that a 
decoder will be able to correct its errors. If the checker determines that it is unlikely that 
the decoder will be able to correct the errors, then the checker can instantly request a 
retransmission of that block, before the decoder has wasted its time trying to decode the 
message. Unfortunately, all known codes with such checkers have rates that approach zero. 
Thus, current constructions would be inappropriate for use in real communication systems. 

Usually, we will impose more structure on the bits of a codeword and the ways in which a 
checker can access them. We will partition the bits of a code into segments and insist that a 
checker read all the bits of a segment if it reads any one. For example, if s is a segment that 
contains the 7-th through j-bits, and if # = (@,...,%,) is a word, then the value of x on 
segment s is (a;,...,2;). If an algorithm reads 2;, we will charge it for reading x;41,..., 2; 
as well. Note that we define segements to be disjoint sets of bit-positions. When we describe 
a checker of a code whose bits are broken into segments, we will count how many segments 


of the code a checker reads and how many bits those segments contain. 
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Remark 4.1.2 This division of the bits into segments can exist purely in the mind of the 
checker. It does not imply that any formatting symbols appear in the data. We describe the 
bits of a code as being broken into segments to indicate the way in which our algorithms will 


access the bits of that code. 


The checkable codes in holographic proofs are derived from verifiable codes. In addition 
to being checkable, a verifiable code has the property that one can probabilistically verify 


that individual bits of a received word have not been corrupted. 


Definition 4.1.3 Let {C;} be a family of codes of lengths {n,;} and relative minimum distance 
6 over an alphabet © such that the bits of each code are partitioned into segments. A verifier 
for {C;} is a probabilistic algorithm that takes a word w and the name of a segment s as input 


such that 
e if w is a word of C;, then the verifier accepts, and 


e if the probability that the verifier accepts a word w (of length n;) on segment s is greater 
than 1/2, then there is a unique codeword c of C; of relative minimum distance at most 


6/3 from w such that c has the same value as w on segment s. 


As with checkable codes, we will measure the performance of a verifier by the number 
of segments of the code that it reads and by how many bits those segments contain. 

We like to view the set of holographic proofs of a statement as being a checkable code 
with semantic restrictions.” Consider a code consisting of the proofs of a given length of a 
certain statement. If the statement is false or if it has no proofs of that length, then the 
code will be empty, and this is OK. Now, assume that this code also has a checker that 
reads only a constant number of bits of its input. The checker will accept every codeword 
in the code, and if it accepts some word with high probability, then that word must be close 
to a codeword. We have just described a holographic proof system. 

Current constructions of holographic proofs have verifiers that read only a constant 
number of bits of their input. Thus, as one checks a holographic proof that a circuit is 
Consider a circuit that has many satisfying assignments. For each of these satisfying assignments, we 


obtain a different holographic proof that the circuit is satisfiable. We view this set of proofs as words in an 
error-correcting code. 
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satisfied, one can probabilistically obtain any bit of the satisfying assignment of the circuit 
while only reading a constant number of bits of the proof. This fact will be very important 


to us in Section 4.4. 
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4.2 Bi and Trivariate codes 


A remarkable property of holographic proofs is that they enable us to demonstrate prop- 
erties of some object while only examining a few bits of that object. In this section, we 
will take a first step in this direction by presenting a constant rate code of length n that 
can be checked and verified by examining O(,/n) of its bits. Actually, the code that we 
develop will be somewhat stronger than this. The code will be divided into segments. Each 
segment will have size O(,/n), and we will be able to check and verify the code while only 
examing data from a constant number of segments. We will later use these codes in a 
recursive construction to obtain codes that are checkable by examining fewer bits in more 
segments. Versions of these polynomial codes and their checking and verification algorithms 
first appeared in [AS92b] and [Sud92]. Our contribution is an improvement in their anal- 
ysis (Theorem 4.2.19) which enables us to use versions of these codes that have much less 
redundancy than was previously possible. 

The codes we construct in this section are polynomial codes over a field. We will use 
F to denote this field. It doesn’t matter what field we use, so long as it is big enough to 
contain the objects that we describe. 

The codewords in our first code will correspond to polynomials in two variables, say x 
and y, such that their degree in a is bounded by a parameter d and their degree in y is 
bounded by a parameter e. Since we will use such vector spaces often in this chapter, we 


will memorialize their description in a definition: 


Definition 4.2.1 A polynomial p(7,,...,%,) has degree (d,,...,d,) if the degree of p in 2; 


is at most d;, for each 2. We do not demand that the d;’s be integers. 


There are a number of ways that we could present a polynomial of degree (d,e). The 
most natural way of presenting such a polynomial would be to write down its coefficients 
(Figure 4-la). Another way of presenting such a polynomial is to choose XY C F and Y C F 
so that |X| > d and |Y| > e, and then write down the values of p at each point of X x Y 
(Figure 4-1b) °. Both of these presentations involve writing down at least (d + 1)(e+ 1) 


pieces of information (elements of F). 


°The reader should verify that such a presentation uniquely defines the polynomial p. 
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Our presentations of polynomials will contain even more information. While the extra 
information will not be necessary to specify the polynomial, it will be useful when we want 
to establish facts about the polynomial without reading its entire presentation. 

We now define a presentation of a polynomial to be the list of its values over some 
domain, along with the list of univariate polynomials obtained when one of the its variables 


is restricted to a value in that domain (Figure 4-Ic). 


Definition 4.2.2 Let p(x, y) be a polynomial of degree (d, e) over a field F, and let X,Y C F. 


A presentation of p over X x Y consists of 


e the list of values of p at each point of X x Y: 


(p(a,y):a €X andy €Y), and 


e for each x € X, the coefficients of the univariate degree e€ polynomial obtained by 
restricting p to 2% (we can view this as a function from X to the space of degree e 


polynomials in y), and 


e for each yo € Y, the coefficients of the univariate degree d polynomial obtained by 
restricting p to yo (which we can view as a function from Y to the space of degree d 


polynomials in «). 


A presentation is viewed as being divided into segments. Each value of » and each univariate 
polynomial listed in the presentation is a separate segment. We refer to X x Y as the domain 


of the presentation. 


Remark 4.2.3 When we write “list of values of p”, we mean that the data should be organized 
into a list in which each element has the same length so that if one wants to look up the value 
of p at (2,y), then one knows instantly where to look in the list. The same should hold for 


the lists of univariate polynomials. Thus, these three lists could be viewed as one function from 


X x Y — F followed by a function from X — F°+! followed by a function from Y — F%!, 


Remark 4.2.4 The definition of a presentation of a degree (d,e€) polynomial naturally gen- 


eralizes to degree (d,,...,d,) polynomials. A presentation of a degree (d,,...,d,) polynomial 
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Figure 4-1: Three ways to describe a polynomial. ¢ is a presentation 


over a domain X, xX --- X X, should consist of the list of values of the polynomial over that 
domain, and & lists of the univariate polynomials obtained by restricting the polynomial in all 


but one of its variables. 


Presentations of distinct polynomials over sufficiently large domains are far apart. This 
follows easily from the following lemma by considering the polynomial which is their differ- 


ence. 


Lemma 4.2.5 [Schwartz [Sch80]] Let p be a degree (d,,...,d,) polynomial over a domain 
D=D,x---x D, where |D;| = n; for i = 1---k. If p has more than 


zeros in D, xX ... X Dz, then p must be the zero polynomial. 


Proof: We prove this by induction. We already know the base case: a non-zero univariate 
polynomial of degree d can have at most d zeros. Now, assume that we have proved the 


lemma for k — 1. There exist degree (d,,...,d,_1) polynomials po,..., pq, such that 


dr 
PGs eS SS" pila, er ae 
i=0 


If one of the p,;’s is non-zero, then it is zero for at most a een a of the values of 
(@1,...,%,-1). For the values of (a,...,%, 1) such that it is non-zero, p can be zero 


at most a oe fraction of the time. |_| 
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The first thing we will prove about presentations of bivariate polynomials is that if we 
are given some data and we are told that it is a presentation of a bivariate polynomial, 
then we can probablistically check whether it is a presentation of a bivariate polynomial 
while only reading a constant number of segments of the data. It is not possible to prove 
such a statement for presentations over domains that are too small, so our theorem will 
only apply to domains that are larger than the degree of the polynomial by some factor. 
Our contribution is that we prove this for domains whose size is greater than de by only a 
constant factor. Previously, Arora and Safra [AS92b] proved such a statement for domains 
whose size was cubic in de. Sudan [Sud92] later improved this to quadratic. In order to 
obtain nearly-linear size holographic proofs, we needed to improve their bounds. 

In order to state the checking algorithm, we will need to state precisely what we mean 
by “data” that could be a presentation of a bivariate polynomial. The following definition 
essentially describes a string that is of the same length as a presentation of a degree (d, e) 
polynomial. The division of the string into segments is purely for the convenience of our 


description, and places no restriction on the definition. 
Definition 4.2.6 A (d,e)-presentation over X x Y consists of: 


e A string that we view as a list of elements of 7, one for each point of X x Y; we say this 


string assigns an element of F to each point of X x Y; 


e for each x) € X, a string whose length is the same as the length of the description of a 


degree € univariate polynomial over 7; and 


e for each yo € Y, a string whose length is the same as the length of the description of a 


degree d univariate polynomial over ¥. 


We will just write presentation if (d,e) is clear from context. 


Remark 4.2.7 This definition does not actually impose any constraints on the content of the 
(d,€)-presentation. Rather, it should be viewed as a description of how the presentation is 
broken into segments and how those segments will be interpreted by the algorithms that read 


them. 
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Remark 4.2.8 This definition naturally generalizes to a definition of a (d,,..., d;,)-presentation 


in a fashion analogous to Remark 4.2.4. 


We have defined a (d,e)-presentation so that a presentation of a degree (d,e) polynomial 
is one. We will now state the algorithm for checking whether a (d,e)-presentation is a 


presentation of a degree (d,e) polynomial. 
Bivariate presentation checking algorithm: 
e Choose a point (a,y) uniformly at random from X x Y. 


e Examine the value, v, assigned to that point, the string that is assigned to 2, and the 
string assigned to y (in a presentation of a degree (d,e) polynomial, these represent 
the value of the polynomial at (2, y), its restriction to x, and its restriction to y). 
Accept if the string assigned to x represents a polynomial that takes the value v when 
evaluated at y and the string assigned to y represents a polynomial that takes the 


value v when evaluated at zx. 


The reader should verify that if this algorithm is run on a presentation of a bivariate degree 
(d,e) polynomial on X x Y, then the algorithm will always accept. We will now see that 
the converse of this statement is true. That is, if this algorithm accepts with probability 1, 


then the presentation must be a presentation of a bivariate polynomial of degree (d, e). 


Proposition 4.2.9 [Well-known] Let X = {2,,...,¢%m}, Y = {y1,---,Yn}, and let d < m 
and e <n. Let f(x,y) be a function on X x Y such that for 1 <7 <n, f(x,y;) agrees on X 
with some degree d polynomial in x, and for 1 <i < m, f(2;,y) agrees on Y with some degree 
e polynomial in y. Then, there exists a polynomial P(z,y) of degree (d,e) such that f(z, y) 


agrees with P(x, y) everywhere on X x Y. 


Proof: — Recall that a degree d univariate polynomial is uniquely determined by its values 
at d+ 1 points. For 1 <j <e+1, let p;(x) be the degree d polynomial that agrees with 
f(a,y;). For 1< j <e+1, let 6;(y) be the degree e polynomial in y such that 


5, (ue) lo utueree 
AYR) = 
’ 0, ifl<k<e4+1, butj Zk. 
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We let P(x, y) = an 6;(y)p;(a). It is clear that P has degree (d,e). Moreover, P(x, y;) = 
f(v,y;) for all « € X and 1 < j < d+1. To see that in fact P(x,y) = f(a,y) for all 
(a,y) € X x Y, observe that P and f agree at e+ 1 points in column y. Since f agrees 
with some degree e polynomial in column y, that polynomial must be the restriction of P 


to column y. | 


Sections 4.2.1 through 4.2.3 will be devoted to proving that, if the algorithm accepts a 
(d,e)-presentation with probability close to 1, then that presentation must look a lot like 
a presentation of some degree (d,e) polynomial. We formalize the notion of presentation 


being close to a presentation of a degree (d,e) polynomial by: 


Definition 4.2.10 A (d,e)-presentation, P, over a domain X x Y is €-good if there exists a 
degree (d, €)-polynomial p such that the presentation of p over X x Y differs from P in at most 


an € fraction of their segments in each list. 


We note that if a presentation P is e good for a sufficiently small ¢, then there is a 


unique polynomial p to whose presentation P is close: 


Proposition 4.2.11 Let P be a e-good presentation of a degree (d,,...,d,,) polynomial over 


a domain D, x... x Dy where |D;| = n;, fori = 1---k. If 


then there is a unique polynomial p such that P and the presentation of p differ in at most an 


€ fraction of their segments. 


Proof: Assume, by way of contradiction, that there were two degree (d,,...,d,) poly- 
nomials p and g that agreed with P in all but an e¢ fraction of their segments. Then the 
presentations of p and gq agree in all but at most a 2¢ fraction of their segments. So, p and g 
must have the same values for at least a 2¢ fraction of the domain. This would imply that 
p—q is zero for a 2e fraction of the domain. However, by Lemma 4.2.5, this would imply 


that p— ¢q = 0, contradicting the assumption that p and q were distinct. | 


Our aim is to now prove 
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Lemma 4.2.12 Let P be a (d,e) presentation over a domain X x Y such that |X| > 8d 
and |Y| > 8e. If the probability that the bivariate presentation checking algorithm accepts is 
greater than 1 — ¢, for « < 1/16, then the presentation p is 3€ good. That is, the code of 
(d, €)-presentations over the domain X x Y has a checker that reads only a constant number 


of segments of its input, each of size O(,/7). 


Proof: The first part follows from Theorem 4.2.19, which is to be proved in Sections 4.2.1, 
4.2.2, and 4.2.3. An alternative approach to proving this Theorem is outlined in Sec- 
tion 4.2.4. To obtain the checker, it is necessary to run the bivariate presentation checking 


algorithm a constant number of times. a 


4.2.1 The First Step 


In our proof of Theorem 4.2.19, we will restrict our attention to the lists of univariate 
polynomials contained within a presentation. In Remark 4.2.3, we observed that the list 
of univariate polynomials of degree e in y could be viewed as a function from X — F°t!, 
We can also view this list as a function in x and y that is arbitrary in the x direction, but 
always looks like a degree e polynomial in y. Because X has size m, an arbitrary function 
on X can be represented as a degree m polynomial.* Thus, we can also view this list as a 
description of a degree (m,e) polynomial in and y. We will call this polynomial C(2, y). 
We will obtain a degree (d,n) polynomial from the other list, and call that polynomial 
R(a,y). If the bivariate presentation checking algorithm accepts with high probability then 
R(x,y) and C(a,y) agree on most of X x Y. 

Let C(a,y) be a polynomial of degree (m,e) and let R(z,y) be a polynomial of degree 
(d,n) such that 


Prob [R(2,y) # Claw] <7 


(x,yJEX XY 


In Theorem 4.2.19, we will show that if y is a sufficiently small constant, then there exists 


4“ Actually, it can be represented as a degree m —1 polynomial. But, we will ignore this fact because it is 
a pain to write “m — 1” everywhere. 
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a polynomial Q(a, y) of degree (d,e) such that 


Prob [R(2,y) # Q(z,y) or C(x, y) F Q(z,y)] < 2 Prob [R(z,y) # C(z,y)]. 


(w,yJEX x (@,y)EX x 


As in [Sud92], we begin by finding a low-degree “error correcting” polynomial that is 


zero whenever R and C' disagree. 


Lemma 4.2.13 Let $ C X x Y bea set of size at most 6°7mn. Then, there exists a non-zero 


polynomial F(a, y) of degree (6m, 6n) such that (2, y) = 0 for all (a, y) € S. 


Proof: The set of polynomials of degree (6m, 6n) is a vector space of dimension (|ém| + 
1)(|én| +1). Consider the map that sends a polynomial to the vector of values that it takes 


for each point in S$. That is, let S = {s,,...,5,} and consider the map 


@: E(a,y) (E(s1), E(s2),..-, B(sp)). 


This map is a homomorphism of a vector space of dimension (|ém | + 1)(|6n] +1) into a 
vector space of dimension at most 6?mn, which is smaller. So, there must be a non-zero 
polynomial in the vector space of polynomials of degree (6m, én) that evaluates to zero at 


every point in S. a 


Let S be the subset of X x Y on which R and C disagree. By Lemma 4.2.13, we can 
choose (a, y) so that 


Ra, y)E(«,y) = C(a,y)E(a,y) for all (w,y)e Xx Y. 


Moreover, C(x, y)E (x,y) is a polynomial of degree (m+ 6m,e+ én) and R(x, y)E(a,y) is 
a polynomial of degree (d + 6m,n+ 6n). By Proposition 4.2.9, there exists a polynomial 
P(a,y) of degree (d+ 6m,e+ én) such that 


Ra, yE(a,y) = Cla, yE(a,y) = P(«,y), for all (a,y)EX x Y. (4.1) 


We would like to divide P by F as formal polynomials and conclude the proof. However, 
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the most we can say is that 


for all (x,y) € X x Y such that E(a,y) #0. 
The next two sections will be devoted to showing that if n is sufficiently large, then F 


does in fact divide P. We will begin with one small step: 


Lemma 4.2.14 Let E(x,y), P(z,y), R(v,y) and C(2, y) be polynomials of degrees (6m, én), 
(d+ 6m,e+ én), (d,n) and (m,e) respectively such that (4.1) holds. If |X| > 6m+d and 
|Y| > 6n + e, then for all yo € Y and for all 2 € X, P(x, yo) = R(x, yo) (x, yo) and 
P(20,y) = C(%0, y)E (20, 9). 


Proof: — For fixed yp, P(#, yo) and R(x, yo)E (x, yo) are degree d + 6m polynomials that 
have the same value on at least d+ 6m +1 points, so P(x, yo) = R(x, yo) E (2, yo) as formal 


polynomials in «. The other case is proved similarly. | 


From this point, we know two routes to the proof of Lemma 4.2.12. The route that we 
pursue in the main text (Sections 4.2.2 and 4.2.3) is more elementary. In Section 4.2.4, we 
show how one can replace Section 4.2.2 and Lemma 4.2.18 with an application of Bezout’s 


Theorem. This route is more direct, but may be accessible to fewer readers. 


4.2.2. Resultants 


In this section, we will review some standard facts about resultants. A more complete 
presentation can be found in [Lan93, vdW53]. We note that Sudan [Sud92] introduced the 


idea of using the resultant to prove that F divides P. 
Let F be a field and let 


aS 
os 

8 
— 

lI 


Pot Prete + Pov’, and 


E(@) = Eot Fiat---+ Fya?* 


be polynomials in a with coefficients in the field F. 
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Proposition 4.2.15 P(x) and E(2) have a non-trivial common factor if and only if there exist 


polynomials A(a) of degree s—1 and B(x) of degree r—1 such that P(x) A(x)—E (2) B(x) = 0. 


Proof: If there exist F(a), P(x) and F(a) such that deg F(x) > 1, P(a) = F(2)P(2), 
and E(x) = F(«)E(«), then we can choose A(#) = E(#) and B(x) = P(2). 
To go the other direction, assume that such A(a) and B(«) exist. Since the degree of 


P(«) is greater than the degree of B(x), P(x) and /(2) must share a common factor. a 


We can reformulate this as a system of linear equations in the coefficients of A and B: 


Po Ao = Eo Bo 
P,Aot PoAy = EB, Bot Eo B, 
P,Agpt+ P,Ai+ Pods = £yBot E,By+ Ey Bo 
P,As-1 = E,By-4 


If we treat the coefficients of A and B as the variables of a system of linear equations, then 
we find that the above equations have a solution if and only if the matrix M(P, F) has 


determinant zero, where M(P, FE) = 


PP, Pyiy we. ne) Po 0 ... =O 

0 P,, PL Po 0 
S TOWS 

0 0 P, Pi Po 

Eb, Ey-4 Eo 0 0 
Tr TOWS 

0 

0 Lee O FF, E,y ... Ee 


We now define R(P, F), the resultant of P and EF, to be the polynomial in the coefficients 
of P and F obtained by taking the determinant of M(P, E). We obtain: 


Proposition 4.2.16 The polynomials P(x) and (a) share a common factor if and only if 
R(P,E) =0. 
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The following fact about the derivative of the determinant of a matrix of polynomials will 


play a crucial role in our proof: Let 


Pi sl) Pi2() tae Die 2) 
M(x) = Pol) Pal) : . Pa(@) 
Pr() Pro() tae Die k( 2) 


be a k-by-k matrix of polynomials in x over F and let R(x) be the determinant of M(x). 


Proposition 4.2.17 R’(x), the derivative of R(x), can be expressed as 


Pia(t) Piel) --. Pigl#) Pi(®) prt)... pre(2) 
R(«) = Pail) Peale) Paw(@) heed Pail) Pal) - Pas(@) 
Pril@) Pro(@) ++ Peale) Pi(®) Deol@) ++ Die (®) 


4.2.3. Presentation checking theorems 


Since the propositions of the previous section concerned univariate polynomials, you may 
be wondering how we are going to apply them to bivariate polynomials. The idea is to treat 
the polynomials P(z,y) and E(#,y) as polynomials in y over F(x), the field of rational 
functions in a. F(a) is the field comprising terms of the form p(x)/q(z), where p(a) and 
q(z) are polynomials in F. It is easy to verify that this is in fact a field. 


We can now consider P and F as polynomials in y with coefficients in F(a) by writing 


P(a,y) = Po(a) + Pr(a)y tee + Pinge(e yt? 


E(a,y) = Eo(a) + Ei(w)y +--+ Esn(a)y®”. 


We will show that F divides P as a polynomial in y over the field F(x). By Gauss’ Lemma’, 
this implies that F divides P over F[a], the ring of polynomials in x, which means that 
E(a,y) divides P(x, y). 

°The usual statement of Gauss’? Lemma is that if a polynomial with integer coefficients can be factored 


over the rationals, then it can be factored over the integers. A proof of Gauss’ Lemma can be found in most 
undergraduate algebra textbooks. 
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We will begin our proof by dividing F and P by their greatest common divisor. If that 
greatest common divisor is not F, then we obtain two polynomials with no common factor. 
To obtain a contradiction, we will show that these two polynomials have a common factor 
when considered as polynomials in y over F(a). By Gauss’ Lemma, this will imply that 


they share a common factor when considered as polynomials in x and y. 


Lemma 4.2.18 Let (x,y) be a polynomial of degree (am, Jn) and let P(x, y) be a polyno- 
mial of degree (am + 6m, Gn + en). If there exist distinct 71,...,2,, such that '(2;, y) divides 
P(a;,y) for 1 < ti < m, distinct y,,...,y, such that E(x, y;) divides P(x, y;) for 1 <i<n 
and if 

l>a+@+ét+e, 


then E(x, y) divides P(x, y). 


Proof: Assume, without loss of generality, that G > a. Let F(x,y) be the largest 
common factor of P(x,y) and E(x,y). Assume by way of contradiction that F #4 F and 
that F(2,y) has degree (a,b). Set 


P(w,y) = Plw,y)F(a,y) and E(#,y) = E(w, y)F(x,y). 


We will now divide P and E by F and apply the lemma to P and FE over the rows and 
columns on which F is not identically zero. The conditions of the lemma are satisfied by P 
and FE on this domain because F’ can be identically zero on at most 6 rows and a columns, 


and 
am—-a ém—-a Bn-—b en—b 


a+ B+é+e> 


—4 m—a n—b n—b- 
Thus, we can assume without loss of generality that P(a,y) and E(a,y) have no common 


factors. We will use this assumption to obtain a contradiction. Write 


P(x, y) = Po(a) + Pi(w)y + 22+ + Ponsen(a yy Oro” 


E(a@,y) Eo(2) + Ei(a)y + +++ + Ban(a)y™, 
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and form the matrix M(P, F)(#) = 


Poppeyn(®) «-. a Po(t) ... 0 
0 Ppte)n (x) Po(x) 
Egn(2) E(x) 0 0 
0 (9+ )n 
0 0 Ean (2) E(x) 


R(P, E)(x), the resultant of P and F, is the determinant of M(P, £)(x) and can therefore be 
viewed as a polynomial in x. M(P, F)(x) has Bn rows of coefficients of P and (G+e)n rows of 
coefficients of F, so R(P, F)(z) will be a polynomial of degree at most mn(3(a+é)+(G+e)a). 
We will show that RCP, £)(«) is in fact the zero polynomial by demonstrating that it has 
more than this many roots. 

For 1 <i <n, E(a;,y) divides P(a;,y), so we can see that the first Gn rows of 
M(P,F)(#;) are dependent on the last (9 + €)n rows of M(P, F)(a;). This implies that 
M(P, FE )(2;) is a matrix of rank at most (+ €)n (actually, the rank is exactly (3+ €)n). 
By Proposition 4.2.17, the k-th derivative of R(P,)(x) at x; is the sum of determinants 
of matrices of rank at most (9+ ¢)n +k. Since M(P, £)() is a matrix of side (26 + €)n, 
RP, E)(2;) is zero for k < Bn. That is, R(P, F)(z) has a zero of multiplicity Gn at each 


of 21,...,2%m. Because we assumed that 1 >a+(6+6+eand § > a, we find 


m(Gn) > mn(Ba + 35 + 8° + Be) > mn(Ba + B6 + af + ac), 


so R(P,E)(«) must be the zero polynomial. Applying Proposition 4.2.16, we see that EF 
and P must have a non-trivial common factor when considered as polynomials in y over 


F(«), which is a contradiction. a 


We can now prove: 


Theorem 4.2.19 [Bivariate Checking] Let F be a field, let X = {21,...,%} C F, and let 
Y = {y1,---, Yn} C F. Let R(x, y) be a polynomial over F of degree (d,n) and let C(x, y) be 
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a polynomial over F of degree (m,e). If 


Prob [R(z,y) # C(z,y)] < 6° and 


(x,yJEX XY 


d € 
1 > 2— + 2— 4+ 26, 
m n 


then there exists a polynomial Q(x, y) of degree (d,e) such that 


Prob [R(a,y) 4 Q(a,y) or C(a,y) F Q(2, y)] < 26, 


(x,yJEX XY 


Prob[C(2o, 9) =, Q(to,y)] > 1-267, and 


Prob[ R(x, yo) =e Q(%, yo)] > 1-267. 
yoEY 


Proof: Let S be the set of points such that R(a,y) # C(a,y). By Lemma 4.2.13, there 
exists a polynomial F(a, y) of degree (6m, 6n) such that $ is contained in the zero set of FE. 


By Lemmas 4.2.14 and 4.2.18, there exists a polynomial Q(x, y) of degree (d,e) such that 


R(x, yE(x,y) = C(2, yE(2,y) = Q(x, yJE(2, y), 


for all (a, y)€ X x Y. 

This implies that in any row on which F(a,y) is non-zero, Q@ agrees with R on that 
entire row. However, F can be identically zero on at most 6n rows; so, # must be non-zero 
on at least (1 — 6)n rows. Thus, Q must agree with R on at least (1 — 6)n rows. We can 
similarly show that Q must agree with C on at least (1 — 6)m columns. We could stop 
now, content in the knowledge that & and C agree on the intersection of (1 — 6)n rows and 
(1 — 6)m columns; however, we will show that they agree on many more points. 

As before, let S be the set of points at which R(z,y) # C(a,y). Let T be the set of 
points at which R(x,y) = C(2,y), but Q(a,y) # R(z,y). We will show that |7| < |S], 
which will prove the first inequality. Call the rows on which R disagrees with @ bad and 
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define bad columns similarly. Let 6, be the number of bad rows and let 6. be the number 
of bad columns. Call good any row or column that is not bad. We will say that a row and 
column disagree if R and C take different values at their intersection. We first observe that 
there can be at most e+, points of JT in any bad column: if a column has more than e +0, 
points of 7, then it must have at least e + 1 points in good rows at which C agrees with 
Rand therefore Q, implying that that column is in fact good. Because we assumed that 
1 > 26+ e/n, every bad column must have at least n/2 points of $ in the intersection of 
that column with the good rows. We can similarly analyze the bad rows to see that each 
must have at least m/2 points of S in the intersection of that row with the good columns. 
We thereby conclude that |7'| < |.$|. (the basic idea is that the points of 7’ must lie in the 
shaded region of Figure 4-2). By examining the two regions in Figure 4-2 that contain the 


points of $, we conclude 


b,, be 
ay 2O nm SS — 290". and Zhe <&’nm > — < 26. 
2 n 2 m 


| 
bad 
cols 
if me Bae 
la 
| 
S 
i 
n « 
| e rOWS 
a 
| 
| 
d cols S ; bad 
| | rows 
\ J 
$e ee 


Figure 4-2: The arrangement of bad rows and columns. 


When we construct holographic proofs, we will need a version of Lemma 4.2.12 that 
applies to trivariate presentations. We will not attempt to optimize this version. 


While the following proof might look a little long, it is actually very simple: we will first 
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apply Theorem 4.2.19 in each (w,a) plane and each (w,y) plane to obtain a collection of 
bivariate polynomials in (w, x) and (w, y) respectively. We will then treat these as univariate 
polynomials in « and y that assume values that are functions in w, and then combine them 


with another application of Theorem 4.2.19. 


Corollary 4.2.20 Let P(w,z,y), R(w,x,y), and C(w, x, y) be polynomials over a field F of 
degrees (c, 12d, 12e), (12c, d, 12e) and (12c, 12d, e) respectively. Let W, X, and Y be subsets 
of F of sizes 12c, 12d, and 12e respectively. If 


€ 
cocegher ee xy EPC BY) _ R(w, 2, y) _ C(w, x, y)] >I1- 2916’ 


for « < 1, then there exists a polynomial Q(w, 2, y) of degree (c,d,e) such that 


2v/e 
weg xy SM x, y) _ Pw, x, y) _ Rw, x, y) _ Cw, x, y)| >il— “9° 
Proof: We begin by observing that there exists a set Y’ C Y such that |Y’| > (1-) ly’ 
and for each yo € Y', 

ia 


Co Ee EPCs es Yo) = R(w,z,yo)| > 1- $a 


For each yo € Y', we now apply Theorem 4.2.19 to the polynomials obtained by restricting 
P and R to yo to obtain a degree (c,d) polynomial in w and 2, which we will call RY, such 
that 

Prob[ R¥°(2o, w) =u P(w, 20, yo)| > 1- ave 


woEX 
For yo ¢ Y', we let R¥° = 0. We are now going to treat the R¥%’s as univariate polynomials 


in « that take on values that are functions of w. We then combine these together to form 


a degree (d,n) polynomial over F(w) that we will call RY by defining: 
R* (2x0, Yo) = RY (w, Zo). 


We see that 
Prob [R*(2,y) =u P(w,2z,y)]>1- 3ve 


(w,y)EX XY 54 . 
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We now combine C' and P in the same way to obtain a degree (m,e) polynomial over 


F(w) that we call C* such that 


Prob [C*(2,y) =. P(w,2z,y)]>1- ave 


(w,y)EXxY 


By combining these two inequalities, we find 


Prob [C*(2,y) =u RY (2, y) =u P(w,2,y)] > 1- 


(x, y)EX XY 


a 


Since 


d € Je 
1 > 2— 4+ 2— + 24/— 
> 12d + 12e + g’ 


we can apply Theorem 4.2.19 to the polynomials RY and C%* to obtain a degree (d,e) 
polynomial Q(2,y) that takes values in F(w) such that 


Prob [Q(#,y) =u C*(#,y) Sw R*(#,y) Sw P(w, x,y] > 1- ave 


(x, y)EX XY 


It remains to show that Q(#,y) is a degree c polynomial in w as well. To do this, consider 
a value yo € Y such that Q(a,y) agrees with P(w,z,y) for more than d values of z. On 
these values of 2, Q(2, yo) is a degree d polynomial in w. Moreover, for any other value of 
a, the value of Q(x, yo) can be expressed as a linear combination with coefficients in F of 
these polynomials, so Q(a, yo) will be a degree d polynomial in w for all ¢ € X. We now 
observe by simple counting that there must be more than e values of yo for which this is 
true, so we can apply the same reasoning for these values of yo, and then in the x direction, 


to see that Q(a,y) is a degree (c,d,e) polynomial in w, z and y. | 
This corollary implies that there is a trivariate presentation checking algorithm analo- 
gous to the bivariate presentation checking algorithm. 


4.2.4 Using Bezout’s Theorem 


In this Section, we describe a different way of proving Theorem 4.2.19. We use Bezout’s 


Theorem which, roughly stated, says: 
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Theorem 4.2.21 [Bezout] Let p(z,y) and q(x, y) be polynomials of degree d and degree e 
respectively. If p and g have more than de zeroes in common (counting zeroes by multiplicity), 


then p and g share a common factor. 


We now pick up our proof where we left off at the end of Section 4.2.1. For simplicity 
of exposition, we will only consider the case in which n = m and d =e. The cases in which 


they are different are proved similarly. 


Proposition 4.2.22 Let E(a,y) and P(x, y) be degree (én, 6n) and degree (d+ é6n,d + 6n) 
polynomials over a field F such that (xo, y) divides P(ao, y) as a polynomial in y for 5(d+6n) 


distinct values of a. Then, (2, y) divides P(2,y) as a polynomial in x and y. 


Proof: Let 21,...,€5(a46n) be distinct values such that E(2;,y) divides P(a;,y) as a 
polynomial in y. Assume that EF has degree exactly én in y (it is to our advantage if 
this is not the case) and assume that E(2;,y) has full degree in y for 1 < i < 5d + 46n. 
Because F(2;, y) divides P(2;,y), E(x, y) and P(x, y) share at least 6n zeroes® (counted by 


multiplicities) along the line = 2;, for each 1 <2 < 5d + 4é6n. Because 


deg(E)- deg(P) < 46n (d+ én) < én (5d + 4é6n), 


Bezout’s Theorem implies that P and F share a common factor. If we divide F and P 
by this common factor and again apply the same argument, we eventually discover that 


divides P as a polynomial in x and y. | 


One can essentially replace Section 4.2.2 and Lemma 4.2.18 with an unbalanced version 
of Proposition 4.2.22. The bounds that we thereby obtain in Theorem 4.2.19 are slightly 


weaker, but still sufficient for the purposes of this chapter. 


4.2.5 Sub-presentations and verification 


It is clear that a presentation of a polynomial on a domain P contains presentations of that 


polynomial on subsets of that domain. In this section, we will explain how to deal with 


°For simplicity, we ignore the «;’s for which E does not have full degree. However, if we look back at 
Lemma 4.2.14, we see that # and P share zeroes at infinity at these z;’s. 
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these sub-presentations. We begin with the following simple observation: 


Proposition 4.2.23 Let P be a c-good (d,, d,, d,)-presentation over a domain D = X x Y x 
Z, and let D’ = X' x Y' x Z' be a subset of DP such that 


c|D'| 2 |D|. 


If we let P’ consist of the elements of P that correspond to a presentation over D’, then P’ is 
ce-good. Moreover, if 
dy d d 


— 4+ 4 ~~ 4 dee < 1, 
IX] |¥| [Z| 


then the degree (d,, d,,d.) polynomial to which P’ is close is the same polynomial which P is 


close to. 


Proof: The first part is obvious. The second follows from Lemma 4.2.5. | 
Tri-variate verification algorithm (at (20, Yo, 20)) 
Remark: Assume that this algorithm is given a presentation P over X x Y x Z. 
Remark: We will let ¢ be some small constant. 
1. Check that the presentation P is «good. 


2. Check that the sub-presentation on X x Y x % is €E good using the bivariate 
presentation checking algorithm. 


3. Choose a constant number of points of X x Y x z) at random and check that the 
univariate polynomial of P in z that goes through each takes the value assigned to 
that point by the presentation. 


4. Choose a constant number of points of X x yo X z and check that the univariate 
polynomial of P in x that goes through each point takes the value assigned to that 
point by the presentation. 


5. Check that the univariate polynomial of P in « that goes through (20, yo, Zo) agrees 
with the value assigned to that point by P. 


Another type of sub-presentation is one of lower dimension. For example, a trivariate 
presentation P over X x Y x Z contains a bivariate presentation over X x Y x 2, for zo € Z. 
However, the fact that P is e-good tells us little about this sub-presentation. 

To show that the sub-presentation can be certified as good, we will show a stronger 
property of presentations: they have simple verifiers. That is, once we know that a presen- 


tation is ¢-good for a sufficiently small ¢, we can check whether any particular segment of 
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the presentation is the same as the segment of the presentation of the polynomial that our 


presentation is close to. 


Lemma 4.2.24 There exists a constant c such that for all « > 0 there exist constants in 
the trivariate verification algorithm such that if P is a (d,,d,,d,)-presentation over a domain 
X x Y x Z where 

cd, < |X|, cd, <|Y|, and cd, < |Z|, then 


e If the verification algorithm passes Step 1 with probability at least 1/2, then there is a 


polynomial p(2, y, z) of degree (-,-,-) such that P is €-close to a presentation of p(x, y, z). 


e If the verification algorithm passes Steps 1, 2, and 3, with probability at least 1/2, then 


the sub-presentation of P on X x Y x 2 is €-close to a presentation of p(x, y, 2). 


e If the verification algorithm passes Steps 1 through 4 with probability at least 1/2, then 


the univariate polynomial of P in x through (-, yo, Zo) is p(2, Yo, 20). 


e If the verification algorithm passes Steps 1 through 5 with probability at least 1/2, then 


the value that P assigns to the point (20, Yo, 20) is P(%o, Yo, Zo). 


Proof: These facts follow from Lemma 4.2.5, Lemma 4.2.12 and Corollary 4.2.20. | 


This verification algorithm appears in [Sud92]. Our contribution is the realization that 


it works over domains whose size is linear in the degree of the presentations. 
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4.3. A simple holographic proof system 


4.3.1 Choosing a key problem 


We will base our holographic proof system on a particular NP-complete problem. We want 


to choose a problem such that 


e the descriptions of instances of problems like 3SAT and circuit satisfiability can be 


easily and efficiently transformed into instances of our problem, and 
e it is particularly easy to create a holographic proof system for our problem. 


So that we can better understand the second objective, let us examine why it might be 
difficult to create a holographic proof system directly for the problem of circuit satisfiability. 
The usual way of describing an instance of circuit satisfiability is to describe a circuit by 
providing, for each gate in the circuit, its function and the names of its inputs. Checking 
whether an assignment satisfies a circuit can be a very irregular process: to compute the 
value of a gate in the circuit, one must first look up the names of the inputs to the gate and 
then their values. These gates could be anywhere in the circuit. When we try to design 
a holographic proof system directly for the circuit satisfiability problem, we find that the 
majority of our proof is devoted to making sure that the values of the gates are being 
shuttled around the proof correctly. The overhead that we thereby incur prevents us from 
finding an efficient holographic proof system for this problem. 

We will design a problem such that when we want to determine whether one of its 
constraints is satisfied, we will know instantly where we should find the values that we need 
to examine. In our problem, one will be able to compute the names of the variables involved 
in a constraint just from the name of the constraint. We thereby eliminate the overhead 
associated with the irregular structure of problems like circuit satisfiability. Moreover, it 
will be easy to reduce problems such as circuit satisfiability to instances of our problem. So 
that we can apply the tools of Section 4.2, we will need to make our problem simple in an 
algebraic sense, but we will discuss this later. 

The problem that we will use will be a coloring problem on a “wrapped de Bruijn 


eraph”. An instance of the problem will consist of an assignment of a first color to each 
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node in the graph. A solution of the problem will be an assignment of a second color to 
each node in the graph so that the pairs of colors assigned to each node and its neighbors 
satisfy certain coloring rules. Each coloring rule will restrict the configuration of colors that 
can be assigned to a node and its neighbors. The same rules will apply to every node in the 
eraph. We will call an instance of the coloring problem a coloring problem instance and a 
solution of such an instance a coloring problem solution. 


Let us recall the definition of a de Bruijn graph (see Figure 4-3 for an example). 


Definition 4.3.1 The de Bruijn graph B,, is a directed graph on 2” nodes in which each node 
is represented by an n-digit binary string. The node represented by the string (21,...,%,) has 


edges pointing to the nodes represented by 


(Hie. tases) and (acta 1), 


where by a @ b we mean the sum of a and 8 modulo 2. 


Figure 4-3: A de Bruijn graph 


We will define a wrapped de Bruijn graph to be the product of a de Bruijn graph with 
a cycle, where in the product we will put an edge from vertex (x,a) to vertex (y, 6) if and 
only if there are edges from a to y and a to b (See Figure 4-4). The size of the cycle needs to 
be some constant times n that is large enough for us to perform certain routing operations 


on the graph. 5n will suffice. 


Definition 4.3.2 The wrapped de Bruijn graph B,, is a directed graph on 5n - 2” nodes in 
which each node is represented by a pair consisting of a number modulo 5n and an n-digit 


binary string. The node represented by the pair (a,(21,...,%,)) has edges pointing to the 


nodes represented by 


e 


ei 
eA 


Sse 


eo ee 


YES 


SSS 


istatots 


Figure 4-4: A wrapped de Bruijn graph 
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We use wrapped de Bruijn graphs for two reasons: one can route any permutation 
on such a graph and, as we will see in Section 4.3.2, they have a very simple algebraic 
description. 

Our ability to route permutations on a wrapped de Bruijn graph enables us to “draw” 
a circuit on one. We begin by considering an ordinary drawing of a circuit (Figure 4-5). 
To draw the circuit on a wrapped de Bruijn graph, we will associate one column of the 
wrapped de Bruijn graph with the gates of the circuit (some nodes of the column may not 
be associated with any gate in the circuit). To each type of node that could appear in a 
circuit, we will assign a color; thus, the association of nodes in the circuit with nodes in the 
eraph will be accomplished by coloring the nodes in the graph with the correct colors. For 


the convenience of later arguments, we will also create a special “output” node. 


Figure 4-5: A drawing of a circuit 


We now color the remaining nodes of the graph in a manner that describes the connec- 
tions that appear between the gates in the circuit. To each of these remaining nodes, we 
associate a switching action such as one of those depicted in Figure 4-6 (again, by coloring 
a node with a color associated with its switching action). Note that we allow a switch to 
copy an incoming message and send it to both of its outputs (e.g. the switch in the lower 
left-hand corner of Figure 4-6). We view the task of assigning the switching actions that 


connect gates to their inputs as a routing problem in which each node has at most two 
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incoming packets. By using standard packet-routing techniques (see [Lei92]), we see that 
5n steps through a de Bruijn graph of 2” nodes are sufficient to solve the routing problem. 


Thus, we can find switching actions for each of the nodes in the wrapped de Bruijn graph 


so that the output of each gate is routed to the inputs of those gates that need it (see 


J. 
ro 


Figure 4-6: Some switching actions 


Figure 4-7). 


NA 


A proof that the circuit is satisfiable should consist of an assignment of 0’s and 1’s to 
the inputs and the gates of the circuit that is consistent with a satisfying assignment (See 
Figure 4-9). The translation of this proof into a second coloring of the graph will consist of 
an assignment of 0’s and 1’s to the wires entering and leaving the nodes of the graph that 
is consistent with the assignment, the actions of the gates, and the switching actions. Since 
we are only supposed to color nodes of the graph, and not its wires, the proof will actually 
assign each node a four-tuple of symbols that indicate whether the edges attached to the 
node are 0, 1, or blank. (Figure 4-8 contains some valid second colors). We will choose our 
coloring rules so that the only legal colorings of the graph will be those that assign a 0 or 1 
to each input, correctly propagate values along wires, correctly compute the value of every 
gate, and produce a | at the output gate. Figure 4-9 contains a picture of a proof that 
the circuit in Figure 4-5 is satisfiable. Figure 4-10 contains depictions of parts of second 
colorings that violate the coloring rules. 

Now, we have fully described what constitutes a graph coloring problem instance and a 


eraph coloring problem solution. 
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Figure 4-7: A drawing of the circuit on a wrapped de Bruijn graph. The left and right 
columns of the graph should be identified with one another, but are drawn twice for the 
convenience of the artist. Dark lines correspond to wires in the circuit. 


Figure 4-8: Some legal second colors for the switches in Figure 4-6. 
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Figure 4-9: Two proofs that one circuit is satisfied. 


Figure 4-10: Some assignments of second colors that violate the coloring rules. 
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Remark 4.3.3 The length of this type of description of a circuit only differs from the length 
of a more conventional description by a constant factor. When one usually describes a circuit 
of m gates, one assigns a name to each gate and, for each gate, lists the operation that the 
gate performs and the names of the gates that are its inputs. This description should have size 
6(mlogm). Since we describe a circuit of m gates by assigning one of a constant number of 
colors to each node of a graph of size O(mlogm), these descriptions differ in size by at most 


a constant multiplicative factor. 


4.3.2 Algebraically simple graphs 


In this section, we will show that wrapped de Bruijn graphs have a very simple algebraic 
description. Actually, we obtain an algebraic description of graphs that we call extended de 
Bruijn graphs, which are the product of the line graph on 5n + 1 nodes and the de Bruijn 
eraph. The identification of the first and last columns is provided by the holographic 
proof. We will use this description to translate our graph coloring problem into an algebraic 
problem to which we can apply the machinery we developed in Section 4.2. We begin by 


defining a graph on the elements of GF'(2”). 


Definition 4.3.4 A Galois graph G,, is a directed graph on 2” nodes in which each node is 
identified with an element of GF(2"). Let a be a generator’ of GF'(2”). The node represented 


by y € GF(2”) has edges pointing to the nodes represented by 


ay and ay +1. 


Lemma 4.3.5 The Galois graph G,, is isomorphic to the de Bruijn graph B,,. 


Proof: We will begin by recalling a standard representation (the non-technical meaning 
of representation) of GF(2”). Let p(a) be an irreducible polynomial of degree n in GF(2)[a] 
(i.e. a polynomial with coefficients in GF'(2) that is irreducible over GF'(2)). 


GF(2") = GF(2)[a]/p(a). 


7A generator is an element such that a?"—! = 1 and a” #1 for any 0< k < 2" —1. Every element of 


GF(2") can be represented by a polynomial of degree less than n in a with coefficients in {0,1}. 
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That is, the elements of G'F'(2”) can be represented as polynomials in a of degree at most 
n— 1 with coefficients in GF(2). Addition of these polynomials is performed component- 


wise. To understand how multiplication behaves, let 


p(a) =a" + cant tee +e, at cy. 


We multiply elements of GF'(2”) by the standard multiplication algorithm for polynomials 
(remember to add component-wise). If we obtain a polynomial of degree higher than n— 1, 
we apply the relation 


ar = car! f+ +e, 1a + ep 


until we obtain a polynomial of degree at most n — 1. 
We will represent the vertices of the de Bruijn graph on 2” vertices by length n vectors 
of elements of GF(2). 


We define ¢, and isomorphism from vertices of the de Bruijn graph to vertices of G,, by 


n—-1 
(81, b2,.-.,bn) = a" 1b; + a" 7(bo + 1b) ++ (. Yes) . 
i=l 


To see that this is an isomorphism, observe that in the de Bruijn graph, the edges leaving 


(b1,..-,6n) go to vertices 
(bo, bs, oe .5 On, 61) and (bo, bs, oe 5 On, Oy AB 1); 


whereas in G,,, one edge leaving 


n-1 
a !b, +a"*(by + €1b;)+-+-4 ('. Yatn) 
Il 


goes to 


n-1 
a [oth 0h + €yb)) + ++ 4 (\. Yoh) 
i=l 
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n-1 
by (cpa) +++ + ep1a + en) +0 Gare t eyb}) + e+ +4 (\. Yoh) 
t=1 


n—-1 
a” "by + a"~*(bs + c1bo) ++ +0 (\. + S- a) + €nb1, 


i=l 
and the other goes to 


n—-1 
aby + a”2(bs + ¢:b2)-+- +a (\. + S- a) + €,b, + 1, 


i=l 


which we can easily verify are 
o(b2, bs, oe On, by) and o(b2, bs, oe On, by AB 1). 


In our constructions of holographic proofs, we will actually want to identify the de Bruijn 
graph with a graph over GF(2"/") x GF(2"/") rather than with a graph over GF(2”). This 


is easily accomplished: 


Proposition 4.3.6 Let nm be even and let a be a generator of GF(2"/”). Then, the graph on 
GF(2"/?) x GF(2”/”) in which the node represented by (a, 7) has edges to the nodes represented 
by 

(7, a0) and (T,a0 +1), 


is isomorphic to the de Bruijn graph B,,. 


Proof: By Lemma 4.3.5, we see that this graph is isomorphic to the graph on binary 


strings of length n in which the node (b1,...,0n/2,0n/241,+-+,n) has edges to the nodes 
(On /24ds ++ +5 Ons ba, + +5 Onj2,b1) and (On jodie + +5 Ons bo, ~. +5 Onj2,b1 @ 1). 


We now shuffle the b,’s to see that this graph is isomorphic to the graph in which node 


(01, On 2415 02, Onjote,+++,0n/2,6n) has edges to the nodes 


(On 2415 025 Onjopas ++ +5 Onja, Ons 1) and (On (2415 025 On jotas++ +5 On/2, Ons br BD 1), 
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which is easily seen to be identical to the de Bruijn graph B,,. | 


We will conclude this section by presenting a simple algebraic description of the extended 


de Bruijn graph. 


Proposition 4.3.7 Let n be even and let a be a generator of F = GF(2"/*). Let € = 
{l,a,...,a°"}, and let €’ = {1,a,...,a°"~'}. Then, the extended de Bruijn graph on (5n + 
1)2” vertices is isomorphic to the graph on F x F x € in which each vertex in (a, y,z) € FXFXé 
has edges to vertices 


(y,av,az) and (y,ax+4+1,az). 


For convenience, we will often denote a triple (x, y, z) by a vector, . We will then use p,(Z) 


and p2(#) to denote the neighbors of #. Formally, p; and pz are given by 


Pi: (2, y, 2) — (y, ax, az) 


po: (@,y,z7)  (y,ax+1,az) 


4.3.3 Arithmetizing the graph coloring problem 


In this section, we will use the algebraic description of the extended de Bruijn graphs given 
by Proposition 4.3.7 to construct an algebraic version of our graph coloring problem. For 
the purposes of this construction, we will define F, €, and €’ as we did in Proposition 4.3.7. 

Since we have identified the nodes of our graph with F x F x €, we will view a coloring 
of the graph as a function over this domain. To this end, we will choose a set C C F to 
represent the set of allowable colors. Thus, a coloring problem instance can be viewed as a 


function 


T:FxFxEAC. 


So that it will jibe better with the machinery that we have developed, we will actually view 
the coloring problem instance as a degree (|F| ,|F| , ||) polynomial over F. We will similarly 
view the coloring problem solution, P(x, y, z), as a degree (|F|,|F|,|E|) polynomial. When 
we speak of a presentation of a coloring problem solution, we mean a presentation of this 


polynomial. 
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The local graph coloring rules will be described by a constant degree polynomial that 
takes a constant number of variables. Let ¢ € Fx Fx €’ bea vertex of the graph. Assuming 
that T(Z), T(pi(@)), T(p2(£)), P(€), P(pi(@)), and P(p2(#)) are all in C, we can form a 


constant degree polynomial x which has the property that 


x (T(#), T(pi(@)), T(pol#)), P(®), P(p(€)), P(p2x(#))) = 0 


if and only if the colors assigned by T and P to #, p;(£), and p2(#) satisfy all the local 
coloring rules. The reason that we can assume that x has constant degree is that its value 
is constrained only at the points of C°, which is a finite set. Since the coloring rules are the 
same throughout the graph, y does not depend on @. 

In order to check that the polynomials T and P actually do map each node of the graph 
to an element of C, rather than an arbitrary element of #, we will form a constant degree 
univariate polynomial (7) which will be zero if and only if y € C.? Before we check that 
the coloring rules described by x are satisfied, we will first check that ~(P(@)) and o(7(#)) 
evaluate to zero for each 7 € F x F x E. 

To check that the colors assigned to the last column of the extended de Bruijn graph 
are the same as the colors assigned to the first column, we need merely check that for all 


(ew, y)E Fx F, 
T(x,y,1)—T(a,y,0°") =0 and P(a,y,1)— P(x, y,0°") = 0. 


We can now re-state the graph coloring problem as a problem concerning the existence 
of certain polynomials: Given a degree (|F|,|F|,|€|) polynomial T over F, we say that 
another degree (|F|,|F|,|E|) polynomial P solves T if 


(1) w(P(#)) =0 and o(T(2)) = 0, VEE FXFXKE 
(2) x(L(@), Tol), T(o.l@)), P(@), Ploa(@)), P(pl@))) = 0, VBE Fx Fx &! 


(3) T(e,y,1)-T(e,y,a°")=0 and P(x,y,1)— P(a,y,0°") =0, Via,y)eFxXF 


®We can choose #(7) = Lleecly —c). 
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We can view the polynomial 7 as a coloring problem instance and the polynomial P 
as a coloring problem solution. We won’t verify that all these conditions are satisfied by 
examining the values of 7 and P at each point of F x F x € individually. Instead, we will 
provide presentations of the polynomials T(2, y,z) and P(2, y, z), as well as some additional 
information, so that the proof checker will only need to examine only a constant number of 


segments of each presentation. The details of this process are explained in the next section. 


4.3.4 A Holographic Proof 


We will want to provide presentations of P(x, y, z) and T(a, y, z) that are verifiable (see Sec- 
tion 4.2.5). To do this, we will choose domains H and 7 that contain F and € respectively, 
and insist that P and 7 be presented over H x H x %. The conditions of Lemma, 4.2.24 
require that 7 and 7 have sizes that are larger than F and € by a constant factor. Other 
constraints on our choice of H and J will appear later in this section. For now, let us fix a 
field G that contains the field 7, and insist that 7 and / be subsets of G. Note that it is 
possible to find such a field G whose size is the square of the size of F.° We now describe 
what the proof checker should do to verify that P solves 7, and what needs to be provided 
with P to enable to proof checker to do so. To anthropomorphize the situation, we imagine 
that a proof provider wants to supply the proof checker with enough information to check 


that P solves T. 


The techniques that we use in this section are derived from [Sud92]. 


Remark 4.3.8 Various unspecified constants will appear in this discussion. It should be clear 
that sufficiently extreme choices for these constants will suffice to prove the lemma that con- 


cludes the section. 
STEP 1: When provided with presentations of T and P, the proof checker should check 
that these presentations are e-good, for some small constant e. 


Remark 4.3.9 Every time the proof checker is provided with a presentation by the proof 


provider, it will check that the presentation is €-good for some very small ¢. If the presentation 


°Tt is an elementary result from the theory of finite fields that GF(2"/?) is a subfield of GF(2”). 
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passes this test, then the proof checker will thereafter assume that the presentation actually 
does represent some polynomial and that, whenever it queries a segment of the presentation, it 
receives a segment of the presentation of that polynomial. We will choose € to be sufficiently 
small so that if the proof checker were mistaken in this assumption, this mistake would have 


been detected with high probability during the checking. 


We will now describe how the proof checker should verify that P and T satisfy condi- 
tion (1). 


Remark 4.3.10 Actually, the proof checker can only become confident that the polynomials 
to which the presentations of P and 7 are close satisfy relation (1). It cannot know anything 
about any particular piece of these presentations that it does not read. This is an important 


thought to keep in mind as one reads this section. 


The proof checker will insist that the proof provider provide a presentation of the poly- 
nomials ~(P(2,y, z)) and %(T(2,y, z)) on the domain H x H x J. We will choose H and 
J sufficiently large so that these presentations are checkable and verifiable. Since ¢ is a 
constant degree polynomial, the degrees of (P(x, y,z)) and ~(T(2,y, z)) will be only be 
larger than the degrees of P(z,y,z) and T(a,y,z) by a constant factor. Because the proof 
checker cannot be sure that the presentations that it is given actually do represent ~(P) 
and (7), we will refer to the presentations that the proof provider provides as uP) and 
OT). 


ee 


STEP 2: The proof checker should check that the presentations ~(7') and 2(P) «good, 


for some small constant e. 


Now that the proof checker has checked that the presentations of P, T, u(P), and 
oT) are good, it will assume that they are close to presentations of polynomials of the 
appropriate degrees, and it will want to check that these polynomials actually are ¢(P) and 
w(T). It will do this by choosing random points #, and verifying that #(P(@)) = U(P\(2). 
It will then do the same for 7. 


STEP 3: The proof checker should choose a constant number of points {#,,...,%.} C 
Hx Hx J. For each point #;, the proof checker should read T(a;) and P(a;), compute 
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w(T(#;)) and ( P(#;)), and then check that these values agree with UP )(@;) and wT) \(z;). 


From Lemma 4.2.5, we know that if uP) and oT) were presentations of polynomials 
other than ¢(P) and ~(T), then they would fail Step 3 with high probability. Thus, once the 
presentations have passed Step 3, the proof checker will be safe in assuming that whenever 


a cee 


it reads a segment from w(P) or (7), it actually reads a segment of the presentation of 
W(P) of WT). 

We will now describe how the proof provider will convince the proof checker that 
w(P(£)) = 0 for all @ € F x F x €. The proof for o(T) will be similar, so we will just 
discuss the proof for ~(P). 

The proof provider will provide presentations of polynomials that we will call UW’, UW”, 


and W’”, where these polynomials are defined to be: 


gr/2 


W'(2,y, 2) = wr (fis ys 7) ait (*) 


| 
1 
—~ 
8 
oh 
R 
—* 
<— 
hs 
L 


Ww'(a,y, Z) 


W(x, y,2) _ SoU" (a, y,a')2', 


where {f,,..., fons2} are the elements of F. The purpose of these polynomials is best 


understood by observing that 


arl2agr/2 5p 


w'"(2.y.2)= SSSI WP fie )a ty te. 


t=1 j=1 [=0 


The polynomial W’” is the zero polynomial if and only if ¢(P) is zero on all of F x F x €. 
Moreover, we can efficiently check that each of 0’, WU”, and WV’ are formed correctly and 
that W’” is the zero polynomial. We will call W, wv", and W’” the presentations that the 


proof provider provides which it claims are presentations of U’, U", and UW”. 


STEP 4: The proof checker should check that the presentations Ww, Ww", and W’” are & 
good, for some small constant ¢. The proof checker should do the same for the analogous 


polynomials corresponding to w(T’). 
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STEP 5: The proof checker should choose a constant number of random points 


L(Yis Z1)s- +5 (Yor Ze )p CH X J. 


For each point (y;, 2;), the proof checker should check that the univariate polynomial in « 
through (y;,z;) in the presentation of UW’ and the corresponding polynomial in the presen- 
tation of w(P) have the relation indicated by (*). The proof checker should then perform 
the analogous operations to verify that VU” and W’” have been properly formed. 

The proof checker can efficiently verify that the two univariate polynomials through 
(y:, 2) have the relation indicated in (*) by performing a Finite Fourier transform. This 
test is sufficient to convince the proof checker that W’ has been correctly formed because, 
after Step 4, the proof checker is confident that the presentation that purports to be W’ is 
a presentation of the appropriate degree and Lemma 4.2.5 implies that the test in Step 5 
would fail with high probability if UV’ were not the correct polynomial. The proof checker is 
similarly convinced that W” and W” are presentations of the polynomials U” and W’” and 


that they satisfy the desired relations with ~(P). 


STEP 6: The proof checker should choose a constant number of points 
{@,,...,&} 6 Fx Fx é€, 


and verify that W’’(Z;) = 0, for each 2. 

After performing Step 6, the proof checker will be confident that W’” = 0 (this is another 
application of Lemma 4.2.5). If the proof has passed all these steps, then the proof checker 
can be confident that the presentations of P and T satisfy condition (1). 

We will now explain how the proof checker will verify that P and T satisfy condition (2). 
This procedure will be very similar to the procedure for condition (1), so we will only 
describe in detail those places in which it differs. 

The proof checker will need access to presentations of T(pi(#)), T(p2(#)), P(pi(#)), 
and P(p2(£)). We will choose H and J so that these presentations are contained as sub- 


presentations of the presentations of T and P. Let J bea set sothat F CT,7 =7+4+1, 


122 Holographic Proofs 


and Z is some constant times larger than F. We will let 7 = ZU aZ. We similarly choose 
K to be a set which contains € and which is larger by a constant factor. We then set 
J =KUak. We now observe that the presentations of T and P on H x H x J contain 
sub-presentations of T(Z), T(pi(Z)), T(p2(@)), P(@) P(pi(#)), and P(po(#)) on Ix Ix K. 
Proposition 4.2.23 tells us that once we have verified that the presentations of P and T over 
Hx Hx J are egood, we know that these sub-presentations over Z x J x K are 8e-good. 
Since it verified that the presentations of P and T were good in Step 1, the proof checker 
will assume that these other presentations are good as well, and that whenever it queries 
one of these presentations at a point, it receives the value of the polynomial to which the 
presentation is close. 


The proof provider should provide a presentation of 


x (T(#), T(p(®)), T( pol €)), P(®), Plpr()), P(p2(€))) 


on Tx Ix K. We will call this presentation ¥. 


STEP 7: The proof checker should check that the presentation of ¥ is «good for some 
small ¢«. This is analogous to Step 2. 
Since y, pi and p2 are constant degree polynomials, the domains Z and K only need to 
be larger than F and € by a constant multiplicative factor in order to make this step work. 
Now that the proof checker knows that ¥ is close to the presentation of some polynomial, 


it needs to verify that this polynomial has been built correctly from the presentations of T 


and P. 


STEP 8: This step is analogous to Step 3. The proof checker should choose a constant 
number of points {@,,...,%.} CZxZ*xkK. For each point #;, the proof checker should read 
T(@;), T(p.(@:)), T(p2(@;)), P(@:) P(p1(%;)), and P(p2(#;)), and and then check that 
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i)), P(p2(%:))) 


for each 2. 


From Lemma 4.2.5, we know that if ¥ is not close to the correct polynomial, then this 
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step will fail with high probability. Thus, once the presentation has passed Step 8, the 
proof checker will be safe in assuming that whenever it reads a segment from Y, it actually 
receives a segment of the presentation of y. 

Now, the proof checker must check that y is zero over F x F x €'. This is done in 


exactly the same was it was done for w. 


STEPS 9, 10, and 11: Analogous to Steps 4, 5, and 6. 
It only remains to check that condition (3) is satisfied. We will only explain how the 


proof checker does this for P as the procedure for 7 is identical. 


STEPS 12: Check that the sub-presentations of P on H x H x 1 and H x H x a°” are 
«-good for some small ¢. Check these on the same points and make sure that they agree on 


these points. 


STEPS 13: Check that the bivariate polynomials which the presentations of Pon HxHx1 
and H x H x a°” are close to are the restrictions of the trivariate polynomial to which the 
presentation of P is close using Step 3 of the trivariate verification algorithm. 

Note that if the proof checker does not perform Step 13, then it would be possible for 
the sub-presentations on H x H x 1 and H x H x a®” to be identical but have nothing to 
do with the rest of the presentation of P. 

We have now completed our description of our first holographic proof. Observe that the 
proof consists only of a constant number of presentations of polynomials on domains larger 


than F x F x € by a constant factor. We have proved: 


Lemma 4.3.11 There exists a probabilistic algorithm V that expects as input a presentation 
of a coloring problem instance, a presentation of a coloring problem solution, and a constant 


number of additional presentations such that: 


e we refer to the the presentation of the coloring problem solution and the additional pre- 


sentations as a holographic proof; 


e the total size of the presentations in the holographic proof is at most nlog?)) n, where n 


is size of the coloring problem instance; 
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e V only reads a constant number of segments, each of size at most Jnlog?)) n, from 


each presentation; 


e if the coloring problem instance is solvable, then there is a holographic proof that will 


cause V to accept with probability 1; 


if V accepts with probability greater than 1/2, then the inputs of V are close to a pre- 


sentation of a coloring problem instance and a proof that that instance is solvable. 


In the next section, we will see how to recursively apply this proof system to itself. 
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4.4 Recursion 


In this section, we will see how to recursively apply the holographic proofs constructed in 
Section 4.3. 

We begin by examining the operations that are performed by the checker of the holo- 
graphic proofs constructed in Section 4.3.4 (i.e. the operations in Steps 1-13). We observe 


that only four types of operations are performed by the checker: 


e checking whether a given polynomial is zero at a certain point, 
e checking whether a given polynomial assumes a given value at a certain point, 


e checking whether some constant degree polynomial evaluated at a given point takes 


a given value, and 


e in Step 5, checking whether a given polynomial assumes a given set of values at a set 


of points, where the size of that set is smaller than the degree of the polynomial. 


All of these operations can be performed by circuits of size nlog?)) n. The main idea 
behind our recursion will be to provide holographic proofs that each of these operations can 
be performed correctly so that the checker doesn’t actually have to perform the operations. 
Thus, instead of reading a constant number of segments of size roughly O(,/n) to check a 
holographic proof, the checker will be able to read a constant number of segments of size 
O(x/n) for each of the original segments. After k levels of recursion the checker will read 
2%) segments of size roughly O( 4/n). We can then cap this recursion with Theorem 4.0.1 
to obtain holographic proofs of size O(n't*) checkable by a constant number of queries, for 
any € > 0. 

The two main tools that we will use to implement this idea are “encoded inputs” and 


the Fast Fourier Transform. 


4.4.1. Encoded inputs 


When Babai, Fortnow, Levin, and Szegedy [BFLS91] introduced transparent proofs, they 


presented the theorem candidate in an error-correcting code so that the proof checker would 
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only have to read a constant number its bits. This idea was vital to the recursions in the work 
of Arora and Safra [AS92b] and Arora, Lund, Motwani, Sudan, and Szegedy [ALM*92], and 
we will make use of it as well. 

In Section 4.3, we saw how to create a holographic proof that a circuit has a satisfying 
assignment. A weakness of this construction is that the satisfying assignment appears 
within the holographic proof. For our recursion, we will want to construct many holographic 
proofs concerning one piece of data. Thus, we will need some way of checking that part of 
a satisfying assignment in a holographic proof is the same as a piece of external data. We 


want to do this without reading more than a constant number of bits of the external data. 


Theorem: 
C’ is satisfiable. 


Holographic proof that | check proof 
C’ is satisfied by 
E(p(x)) and E(v) 


Proof-checker | 


check that satisfying assignment in proof 


Figure 4-11: The proof checker needs to be sure that the proof shows that C” is satisfied 
by E(p(z)) and E(w), which are written elsewhere. 


actually is this assignment. 


For example, we might want to provide a holographic proof that a polynomial p(«) takes 
the value v at %, but we don’t want to force the proof checker to read the entire description 
of p(x) or v. To do this, we first choose an asymptotically good error-correcting code, which 
we will call F for the rest of the chapter.'° Instead of writing the description of p(a) and 
v in the normal way, we insist that they be presented as E(p(x)) and E(w). Now, consider 
a circuit C that takes a polynomial and a value as input and accepts if that polynomial 


has that value when evaluated at x). We modify C to produce a circuit C’ that expects its 


10 Actually, it is unnecessary that E have constant rate, but we may as well assume that it does. 
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inputs to be encoded in the code F. C” will accept if and only if its inputs are codewords 
of F that encode a polynomial and a value that would cause C to accept. Our checker will 
ask for a holographic proof that C’ accepts on inputs /(p(2)) and E(v). When the checker 
checks that the holographic proof is correct, it becomes convinced that there is a satisfying 
assignment of C’, but it has not yet learned anything about that assignment. To become 
convinced that the satisfying assignment in the holographic proof actually is the same as 
its external data (which it presumes to be F'(p(#)) and F(v)), the checker will read a few 
randomly chosen bits of its external data and check that these agree with the corresponding 
inputs represented in the holographic proof (see Figure 4-11). The checker can check the 
values of inputs represented in the holographic proof because the presentations in the proof 
are verifiable (as discussed in Section 4.1); so, the checker can verify the value of any gate 
in the circuit represented in the holographic proof. 

If we choose F to be a code with constant relative minimum distance then, after success- 
fully comparing a constant number of bits, the checker will be convinced that the external 
data is close to a unique codeword of F and that that codeword is the same as the one 
in the satisfying assignment in the holographic proof. This is because, after checking the 
holographic proof, the checker is confident that its satisfying assignment contains codewords 
of &, and all codewords of F are far apart so a piece of external data can agree with high 
probability with only one codeword of £. If we choose E to be a code in which it is easy to 
verify whether or not a word is in the code, then the circuit C’ will have almost the same 
size as the circuit C. The codes that we constructed in Chapters 2 and 3 can be decoded 
in linear time, so these are well-suited to our purposes. If we use one of these codes, then 
the size of C’ will exceed the size of C’ by at most a constant times the size of its inputs. 


We assume that F is one of these codes in the rest of this chapter. 


4.4.2 Using the Fast Fourier Transform 


One’s first inclination for how to recursively apply our construction of holographic proofs— 
encode every segment of every object in an error-correcting code and provide holographic 
proofs for each operation that the checker needs to perform—creates proofs that are a little 


too big. Consider what happens if we try this for the bivariate code that we presented in 
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Section 4.2. There are 6(\/n) univariate polynomials of degree @(,/n) in a presentation 
containing @(n) bits of data. The checker might want to evaluate any of these polynomials 
at any of @(./n) points. The holographic proof that a polynomial of degree @(,/n) takes 
a certain value at a certain point will have size at least 6(,/n). If we were to provide a 
separate proof for each of the 6(,/n) polynomials at each of 6(,/n) points, then these proofs 
would have total size at least 6 (n°/°). 

To prevent this explosion, we will combine many of these proofs into one. For each 
polynomial, we will create just one proof that provides its values when it is evaluated at all 
of the points we are concerned with. We replace each presentation of a degree (d,, d,, d.) 


polynomial p over a domain X x Y x Z with (see Figure 4-12) 
e for each point in the domain, an encoding of the value of p at that point, 


e for each univariate polynomial in the presentation, and encoding of the description of 


that polynomial, and 


e for each univariate polynomial in the presentation, a holographic proof that presents 
its value for each point it intersects in the domain; this proof should be verifiable so 
that it can be used as a proof that any one of the encoded polynomials takes one 
of the encoded values at a point. The proof checker will need to provide a coloring 


problem instance that describes this relation. 


For each of the four types of operations that the checker needs to perform (listed at the 
beginning of Section 4.4) we present a holographic proof that this operation has been per- 
formed correctly. 

Aho, Hopcroft and Ullman [AHU74, Chapter 8.5] show that a polynomial can be ef- 
ficiently evaluated at any reasonable set of points in a field using a few discrete fourier 
transforms. Using the efficient implementation of the discrete fourier transform over a fi- 
nite field presented by Preparata and Sarwate [PS77], the univariate polynomials can be 
evaluated at each point in their domain by circuits of size nlog?)) n. Thus, if we use 
Lemma 4.3.11 to construct the holographic proofs, the total size of these objects is larger 


than the size of the presentation by at most a poly-logarithmic factor. Similarly, if we use 
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Theorem (encoded): 
Checker provides: | p(x)(0,1,2) = (a,b,c) 


Figure 4-12: A presentation is replaced for recursion. 
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Theorem 4.0.1 to construct the holographic proofs, then the total size of these objects is at 
most a polynomial in the size of the presentation. 


We can now prove the first variation of our main holographic proof theorem. 


Theorem 4.4.1 [Efficient holographic proofs, variation 1] For all polynomial-time constructible 
functions g(n) = O(loglogn), there exists a polynomial time algorithm V (called a proof 
checker) that expects as input an encoding of a coloring problem instance 7’ and a holographic 


proof II such that 


14. 2%-a(n)) 


e [II] = |7| (log |7|)°%™); 


e V reads only 249) bits of T and II: 


e if the coloring problem instance 7’ is solvable, then there is a holographic proof II that 
will cause V to accept with probability 1; moreover, II can be computed from a coloring 


problem solution of T in time |II|log“” |IT|: 


e if V accepts (7, IL) with probability greater than 1/2, then 7’ is close to an encoding of 
a coloring problem instance and II contains a presentation that is close to a solution of 


that instance. 


Proof: We first apply Lemma 4.3.11 to itself g(m) times. To perform the recursion, we 
replace each presentation by encodings of its objects and a constant number of additional 
presentations as described in Sections 4.4.1 and 4.4.2. Note that the checker must provide 
presentations of coloring problem instances that say that the operations in the recursion are 
being performed correctly, while the proof must contain presentations of coloring problem 
solutions. These presentations are verifiable so that the checker can make sure that they 
actually refer to the encoded objects. 

At this point, we have a proof II that has size |7'| (log |T|)@#) which can be checked by 
examining 2%) of its segments, each of which has size at most ire (log |T])44"). We 
now cap the recursion by applying Theorem 4.0.1 to construct the last set of holographic 


proofs. This will result in a polynomial blow-up in the size of each presentation. | 
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4.5 Efficient proof checkers 


In this section, we explain how to make the checkers of the proofs presented in section 4.4 
run in poly-logarithmic time. We begin by examining the computations performed by the 
checker of Theorem 4.4.1. Most of the checker’s computation is devoted to constructing 
the coloring problem instances that are needed in the recursion. Fortunately, these coloring 
problem instances do not depend on the actual coloring problem being proved solvable. 
They are only statements like “there is a polynomial of degree d that evaluates to some 
value at x.” Thus, all of these presentations of coloring problem instances can be computed 
once and used in all proof checking to be done thereafter. Assuming that the proof checker 
has access to these presentations, it need only do poly-logarithmic work: the rest of its 
operations are choosing which parts of the proof to look at and checking proofs constructed 
by Theorem 4.0.1. Thus, we have proved the second variation of our holographic proof 


theorem: 


Theorem 4.5.1 [Efficient holographic proofs, variation 2] For all polynomial-time constructible 
functions g(n) = O(loglog n), there exists a table A and a probabilistic algorithm V (called 
a proof checker) that expects as input an encoding of a coloring problem instance 7’ and a 


holographic proof II such that 


14. 2%-a(n)) 


e [II] = |7| (log |7|)°%™); 


e V reads only 2%9()) bits of T, IL, and A; 


e if the coloring problem instance 7’ is solvable, then there is a holographic proof II that 
will cause V to accept with probability 1; moreover, II can be computed from a coloring 


problem solution of T in time |II|log“” |IT|: 


e if V accepts (7, IL) with probability greater than 1/2, then 7’ is close to an encoding of 
a coloring problem instance and II contains a presentation that is close to a solution of 


that instance; 


e V runs in time log*” |7]. 
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For all practical purposes, our checker now runs in poly-logarithmic time. The table of 
encodings of coloring problem instances can be written once and accessed forever. However, 


the need for this table can be eliminated if we use a coloring problem developed in [BFLS91]. 


4.6 Thecoloring problem of Babai, Fortnow, Levin, and Szegedy 


Babai, Fortnow, Levin, and Szegedy set out to provide holographic versions of any proof of 
any mathematical statement. To achieve this end, they rely on the Kolomorgorov-Uspenskii 
thesis which implies that any proof can be efficiently expressed by a computation of a certain 
type of non-deterministic pointer machine [KU58]. They then observe that the computations 
of these pointer machines can be efficiently witnessed by restricted computations on a RAM. 
They translate these RAM computations into a coloring problem on a graph that is similar 
to the coloring problem we used in Section 4.3.1.'' The advantage of the coloring problem 
in [BFLS91] is that the problem instance does not need to be as large as the problem 
solution. 

In the coloring problem of [BFLS91], each node receives only one color. The problem 
instance is described by a coloring of an initial set of the nodes in the graph. The problem 
instance is solvable if there exists a coloring of the rest of the nodes in the graph that 
satisfies all the coloring rules. The type of problem instance that they have in mind is an 
encoding, by a good error-correcting code, of a proposed theorem. They demonstrate the 
existance of a set of local coloring rules such that it will be possible to complete the coloring 
of the graph in agreement with the rules only if the problem instance decodes to a statement 
of a provable theorem. Moreover, given a proof of the theorem, one can easily construct a 
coloring of the rest of the graph that satisfies the coloring rules. Actually, the techniques 
developed in [BFLS91] are even more general—they show that for any non-deterministic 
RAM algorithm, there is a finite set of coloring rules under which the rest of the nodes of 
the graph can be legally colored only if there is a computation of the algorithm that accepts 
on the input indicated by the initial coloring. The application to checking proofs is a special 


case in which the RAM algorithm checks proofs. 


"Our coloring problem was inspired by theirs. It is weaker, but we hope it is easier to understand. 
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The graphs on which their coloring problem is defined are very similar to the graphs 
we used in Section 4.3.1: we can assume that their graphs are finite products of families of 
de Bruijn graphs and line graphs. Their coloring rules can be assumed to be the natural 
generalizations of coloring rules such as conditions (1), (2), and (3) of Section 4.3.1. Thus, 
instances of their graph coloring problem can be checked using algebraic techniques similar 
to those used in Sections 4.2, and 4.3.1. 

The advantage of using the coloring problem of [BFLS91] in our recursion is that it 
eliminates the need for the checker to write coloring problem instances to check the recur- 
sion. For each type of relation that the checker needs to check, there is one uniform RAM 
algorithm that can perform the check. Thus, the checker need merely know the finite set 
of coloring rules associated with checking the computations associated with each of these 
algorithms, and verify that they are satisfied. 

Thus, by combining the algebraic machinery developed in Section 4.2 and the recursion 
developed in Section 4.4 with the coloring problem of [BFLS91], we can prove the final 


variation of our holographic proof theorem: 


Theorem 4.6.1 [Efficient holographic proofs, variation 3] For all polynomial-time constructible 
functions g(r) = O(log log n), there exists a probabilistic algorithm V (called a proof checker) 
that expects as input an encoding of a theorem 7’, which we call a theorem candidate, and and 


a holographic proof II such that 


e if J encodes a theorem that has a proof P, then there is a holographic proof II that 
will cause V to accept with probability 1; moreover, II can be computed from P in time 


IMI} log [II 


? 


y42a-4a()) (log |P)) Aa; 


[IT] = |P| 
e V reads only 249) bits of T and II: 


e if V accepts (7, IL) with probability greater than 1/2, then 7’ is close to a unique encoding 


of a true theorem and II constitutes a proof of that theorem; 


e V runs in time log“? |II]. 
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For convenience, we state the following corollary: 


Corollary 4.6.2 For all « > 0, there exists a probabilistic polynomial-time turing machine V 
that accepts two inputs, a theorem candidate T and a witness II. V reads only a constant 
number of bits of its input II and uses only O(log |II|) random bits. If there is a proof P of 
T that can be verified by a pointer machine or a RAM in time ¢, then there is an input II of 
size at most o(|t|'*) that will cause V to accept with probability 1. Conversely, if there is 
an assignment of II that causes V to accept with probability greater than 1/2, then C’ has a 


satisfying assignment. 


CHAPTER 5 


Connections 


In this chapter, we will explore some connections between holographic proofs and expander 
codes. First, we wish to point out that current constructions of holographic proofs lead to 
codes that have checkers that read only a constant number of bits of their input: Let C’ be 
a circuit that is satisfied by every input, and consider the code consisting of the holographic 
proofs that C' is satisfied. If these proofs can be checked by examining a constant number 
of their bits (as those constructed in Theorem 4.0.1 and Theorem 4.6.1 can be), and if 
the proofs corresponding to different satisfying assignments have constant relative distance 
from each other, then the checker of these proofs is a checker in the sense of Definition 4.1.1. 
This code has a large number of words—one for each satisfying assignment—but it is still 
highly redundant (even those constructed in Section 4.4 have too much redundancy from 
the perspective of a coding theorist). For this chapter, we will define a checkable code to 
be a code of constant relative minimum distance and at most polynomial redundancy that 
has a checker that reads only a constant number of its inputs. All known constructions of 
checkable codes go through the “PCP-Theorem” (Theorem 4.0.1). 

If we examine a construction of a checkable code carefully, we see that the probability 
that its checker rejects a word is roughly proportional to the distance of that word from 


the nearest codeword’. This rough proportionality is necessary. If a checker examines only 


‘Moreover, the checkable codes that are used in the proof of Theorem 4.0.1 all have this property. 
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k bits of its input, and if it accepts all codewords, then the probability that it will reject a 
word of relative distance « from a codeword is at most ek. Moreover, we know that there is 
some constant 6 such that if a word has relative distance at least 6 from every codeword, 
then the checker must reject that word with probability at least 1/2. Since the checker only 
examines & bits of its input, the probability that it rejects a word must change gradually 
over the space of words. 

Expander codes almost have a checker with this property. Consider the following at- 
tempt at checking an expander code: choose a few constraints uniformly at random and 
accept the input word if it satisfies all of them. In the proof of Theorem 2.3.1, we proved 
that there is a constant 6 such that if a word is within distance 6 of a codeword, then 
the probability that this algorithm rejects the word is roughly proportional to its distance 
from the codeword. However, expander codes are not necessarily checkable. An expander 
code can have words that are far from any codeword but which satisfy all but one of its 
constraints. 

Consider again the random construction of expander codes in Section 2.3. It is very 
likely that some constraint will be independent of the others. That means that there are 
words that satisfy all constraints but this one. Moreover, such a word cannot be close to 
any codeword, because there can be no word with just one unsatisfied constraint that is 
close to a codeword. A similar argument shows that if there are two constraints that are 
not dependent on the others, then there will be words that satisfy all constraints but these 
two. Thus, if we are going to construct expander codes that are checkable by the means 
we suggested in this section, then there must be a high level of redundancy among the 
constraints imposed on the code. 

It is interesting to point out that the checkable codes derived from holographic proofs 
can be decoded by the algorithms used to decode expander codes. Let S be a set of bits 
that a proof checker might examine. We can form a constraint on the bits in S by saying 
that the constraint is satisfied by a setting of the bits in S$ only if the proof checker would 
accept those bits. The constraints formed by examining all such sets are analogous to the 
constraints imposed on the expander codes. Moreover, current constructions of holographic 


proofs induce codes that can be decoded by simple variations of the sequential and parallel 
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expander code decoding algorithms presented in Section 2.3 (these algorithms need to be 
altered slightly to deal with constraints that are not linear, but the general idea of flipping 


bits that are in more unsatisfied than satisfied constraints still works). 


5.1 Are checkable codes necessary for holographic proofs? 


It is conceivable that one could construct holographic proof systems that do not result in 
checkable codes. Consider the statement of Corollary 4.0.2. This statement is sufficient for 
most of the results derived from Theorem 4.0.1. Moreover, Corollary 4.0.2 does not say that 
there must be one holographic proof II for each satisfying assignment of C’ or that these 
holographic proofs must be far apart. Thus, it is not clear that Corollary 4.0.2 leads to a 
construction of checkable codes. 

Arora [Aro94] takes a big step towards showing that statements such as Corollary 4.0.2 
actually do imply the existence of checkable codes. He essentially shows that if one is given 
a holographic proof system in which one can compute in polynomial time a satisfying as- 
signment of a circuit from any holographic proof that the circuit is satisfied, then, assuming 
that collision-intractable hash functions exist, the set of holographic proofs that a circuit 
is satisfiable is a code with constant relative minimum distance. All holographic proof sys- 
tems constructed so far have this property, and it is difficult to imagine one which does not. 
Arora then shows that any such proof system can be easily modified to yield a checkable 
code. 

In light of Arora’s results, we feel that the best hope for finding an alternative construc- 
tion of holographic proofs is to first try to find a direct construction of checkable codes. 
One possible way of doing this would be to develop a carefully controlled construction of 
expander codes in which there is high redundancy among the constraints. We know that it 
is possible to add a slight amount of redundancy to the constraints, but we do not know 
how to achieve a constant factor of redundancy. Another advantage of adding redundancy 
to the constraints is that this would increase the rate of the codes without affecting their 
error-correction ability. Thus, we feel that the problems of constructing checkable expander 


codes and better expander codes are intertwined with the problem of finding expander codes 
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with highly redundant constraints. 
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